Batch Downloading Data

Batch downloading is especially useful when you need to download extremely large datasets. This approach offers several advantages over traditional query methods.

Key Advantages

Streamed delivery — Data arrives as a stream rather than waiting for complete server-side processing
Resumable downloads — Write data to file as it arrives, allowing restart points if connection interrupts
Memory efficiency — Avoids double memory usage common with standard DataFrame downloads
Simplicity — Code length is comparable to standard query-to-DataFrame approaches

Small Batch Download Example

The simplest implementation submits a single large query and writes batches to file:

import liberator, time

start_time = time.time()

for batch in liberator.query(name = 'minute_bars', symbols = None,
                            as_of = '2024-07-01', back_to = '2024-01-01'):
    batch.to_pandas().to_csv("minute_bar_data.csv", mode='a')

print("The query + saving took", (time.time() - start_time)/60.0, " minutes to run")

An enhanced version prevents empty files and handles headers correctly:

import liberator, time

start_time = time.time()

for i, batch in enumerate(liberator.query(name='daily_bars', as_of = '2024-07-01',
                                         back_to = '2024-01-01', symbols = None)):
    if not len(batch):
        continue
    batch.to_pandas().to_csv('daily_bars.csv', mode='a' if i else 'w',
                            header=False if i else True)

print("The query + saving took", (time.time() - start_time)/60.0, " minutes to run")

Large Batch Download Process

For very large datasets, splitting downloads into monthly chunks prevents resource constraints:

# Define the start and end dates and other setup info for the loop

start_year = 2018
start_month = 3
end_year = 2024
end_month = 06
symbols = None
header = True
individual = 0
header_in_individual = 0
dataset = 'minute_bars'

import liberator, time

back_to_year = start_year
back_to_month = start_month
old_month = None

while (back_to_year < end_year) or (back_to_year == end_year and back_to_month <= end_month):
    if back_to_month == 12:
        as_of_month = 1
        as_of_year = back_to_year + 1
    else:
        as_of_month = back_to_month + 1
        as_of_year = back_to_year

    back_to = f"{back_to_year:04d}-{back_to_month:02d}-01"
    as_of = f"{as_of_year:04d}-{as_of_month:02d}-01"

    print(f"Downloading {dataset} {back_to} to {as_of}", end='')
    start_time = time.time()
    head = header if (old_month==None) else False

    for batch in liberator.query(name=dataset, as_of = as_of,
                                back_to = back_to, symbols = symbols):
        if not len(batch):
            continue
        if individual:
            batch.to_pandas().to_csv(dataset+'_'+back_to+'_'+as_of+'.csv',
                                   mode='a', header=header_in_individual or head)
        else:
            batch.to_pandas().to_csv(dataset+'_all.csv', mode='a', header=head)

    print(" Query + save took", (time.time() - start_time)/60.0, " minutes to run. head:",head)
    back_to_year = as_of_year
    back_to_month = as_of_month
    old_month = as_of_month

This monthly chunking approach allows downloading even the largest datasets without memory constraints, making it ideal for production data pipelines.

Documentation Index

​Batch Downloading Data

​Key Advantages

​Small Batch Download Example

​Large Batch Download Process

Batch Downloading Data

Key Advantages

Small Batch Download Example

Large Batch Download Process