Documentation Index
Fetch the complete documentation index at: https://knowledge.cloudquant.com/llms.txt
Use this file to discover all available pages before exploring further.
Batch Downloading Data
Batch downloading is an extremely useful method when you need to download extremely large datasets. This approach offers several advantages over traditional query methods.
Key Advantages
- Streamed delivery — Data arrives as a stream rather than waiting for complete server-side processing
- Resumable downloads — Write data to file as it arrives, allowing restart points if connection interrupts
- Memory efficiency — Avoids double memory usage common with standard DataFrame downloads
- Simplicity — Code length is comparable to standard query-to-DataFrame approaches
Small Batch Download Example
The simplest implementation submits a single large query and writes batches to file:
import liberator, time
start_time = time.time()
for batch in liberator.query(name = 'minute_bars', symbols = None,
as_of = '2024-07-01', back_to = '2024-01-01'):
batch.to_pandas().to_csv("minute_bar_data.csv", mode='a')
print("The query + saving took", (time.time() - start_time)/60.0, " minutes to run")
An enhanced version prevents empty files and handles headers correctly:
import liberator, time
start_time = time.time()
for i, batch in enumerate(liberator.query(name='daily_bars', as_of = '2024-07-01',
back_to = '2024-01-01', symbols = None)):
if not len(batch):
continue
batch.to_pandas().to_csv('daily_bars.csv', mode='a' if i else 'w',
header=False if i else True)
print("The query + saving took", (time.time() - start_time)/60.0, " minutes to run")
Large Batch Download Process
For very large datasets, splitting downloads into monthly chunks prevents resource constraints:
# Define the start and end dates and other setup info for the loop
start_year = 2018
start_month = 3
end_year = 2024
end_month = 06
symbols = None
header = True
individual = 0
header_in_individual = 0
dataset = 'minute_bars'
import liberator, time
back_to_year = start_year
back_to_month = start_month
old_month = None
while (back_to_year < end_year) or (back_to_year == end_year and back_to_month <= end_month):
if back_to_month == 12:
as_of_month = 1
as_of_year = back_to_year + 1
else:
as_of_month = back_to_month + 1
as_of_year = back_to_year
back_to = f"{back_to_year:04d}-{back_to_month:02d}-01"
as_of = f"{as_of_year:04d}-{as_of_month:02d}-01"
print(f"Downloading {dataset} {back_to} to {as_of}", end='')
start_time = time.time()
head = header if (old_month==None) else False
for batch in liberator.query(name=dataset, as_of = as_of,
back_to = back_to, symbols = symbols):
if not len(batch):
continue
if individual:
batch.to_pandas().to_csv(dataset+'_'+back_to+'_'+as_of+'.csv',
mode='a', header=header_in_individual or head)
else:
batch.to_pandas().to_csv(dataset+'_all.csv', mode='a', header=head)
print(" Query + save took", (time.time() - start_time)/60.0, " minutes to run. head:",head)
back_to_year = as_of_year
back_to_month = as_of_month
old_month = as_of_month
This monthly chunking approach allows downloading even the largest datasets without memory constraints, making it ideal for production data pipelines.