Skip to main content

Batch Downloading Data

Batch downloading is an extremely useful method for API users wanting to download extremely large datasets. This approach offers several advantages over traditional query methods.

Key Advantages

  • Streamed delivery — Data arrives as a stream rather than waiting for complete server-side processing
  • Resumable downloads — Write data to file as it arrives, allowing restart points if connection interrupts
  • Memory efficiency — Avoids double memory usage common with standard DataFrame downloads
  • Simplicity — Code length is comparable to standard query-to-DataFrame approaches

Small Batch Download Example

The simplest implementation submits a single large query and writes batches to file:
import liberator, time

start_time = time.time()

for batch in liberator.query(name = 'minute_bars', symbols = None,
                            as_of = '2024-07-01', back_to = '2024-01-01'):
    batch.to_pandas().to_csv("minute_bar_data.csv", mode='a')

print("The query + saving took", (time.time() - start_time)/60.0, " minutes to run")
An enhanced version prevents empty files and handles headers correctly:
import liberator, time

start_time = time.time()

for i, batch in enumerate(liberator.query(name='daily_bars', as_of = '2024-07-01',
                                         back_to = '2024-01-01', symbols = None)):
    if not len(batch):
        continue
    batch.to_pandas().to_csv('daily_bars.csv', mode='a' if i else 'w',
                            header=False if i else True)

print("The query + saving took", (time.time() - start_time)/60.0, " minutes to run")

Large Batch Download Process

For very large datasets, splitting downloads into monthly chunks prevents resource constraints:
# Define the start and end dates and other setup info for the loop

start_year = 2018
start_month = 3
end_year = 2024
end_month = 06
symbols = None
header = True
individual = 0
header_in_individual = 0
dataset = 'minute_bars'

import liberator, time

back_to_year = start_year
back_to_month = start_month
old_month = None

while (back_to_year < end_year) or (back_to_year == end_year and back_to_month <= end_month):
    if back_to_month == 12:
        as_of_month = 1
        as_of_year = back_to_year + 1
    else:
        as_of_month = back_to_month + 1
        as_of_year = back_to_year

    back_to = f"{back_to_year:04d}-{back_to_month:02d}-01"
    as_of = f"{as_of_year:04d}-{as_of_month:02d}-01"

    print(f"Downloading {dataset} {back_to} to {as_of}", end='')
    start_time = time.time()
    head = header if (old_month==None) else False

    for batch in liberator.query(name=dataset, as_of = as_of,
                                back_to = back_to, symbols = symbols):
        if not len(batch):
            continue
        if individual:
            batch.to_pandas().to_csv(dataset+'_'+back_to+'_'+as_of+'.csv',
                                   mode='a', header=header_in_individual or head)
        else:
            batch.to_pandas().to_csv(dataset+'_all.csv', mode='a', header=head)

    print(" Query + save took", (time.time() - start_time)/60.0, " minutes to run. head:",head)
    back_to_year = as_of_year
    back_to_month = as_of_month
    old_month = as_of_month
This monthly chunking approach allows downloading even the largest datasets without memory constraints, making it ideal for production data pipelines.