1. Support Center
  2. Liberator Python User's Guide & Recipes

Batch Downloading data from Liberator (FAQ-p12)

Batch Downloading is an extremely useful method for API users wanting to download extremely large datasets.

 

It provides a number of advantages:

  • Rather than waiting for the whole query to be completed on the server side prior to delivery, the data is delivered as a stream
  • The batching then allows you to write out the query to a file as it arrives, thus if your connection is interrupted, you can find out how far you got and restart from there.
  • Simple query dataframe downloads are very simple to use and useful for small downloads, but for larger downloads they often use double the memory, this can often max out a users PC causing major slowdown. Batching allows the downloading of even the largest datasets.
  • The code is no longer than a standard "query to dataframe" and "dataframe to csv". 

Small batch downloading processes

This couple of smaller scripts submit a single large query and writes to a file as it receives the data in batches from Liberator. The second version also checks for length (will not write a file if no data returned), and will only write the header once.

import liberator, time

start_time = time.time()

for batch in liberator.query(name = 'minute_bars',symbols = None, as_of = '2024-07-01', back_to = '2024-01-01'):

    batch.to_pandas().to_csv("minute_bar_data.csv", mode='a')

print("The query + saving took", (time.time() - start_time)/60.0, " minutes to run")

import liberator, time

start_time = time.time()

for i, batch in enumerate(liberator.query(name='daily_bars', as_of = '2024-07-01', back_to = '2024-01-01', symbols = None)):

    if not len(batch):

        continue

    batch.to_pandas().to_csv('daily_bars.csv', mode='a' if i else 'w', header=False if i else True)

print("The query + saving took", (time.time() - start_time)/60.0, " minutes to run")

 

Larger Batch Download Process

This code will allow you to download a very large dataset by splitting the download into a set of MONTHS.
As such it does not use mcal.

# Define the start and end dates and other setup info for the loop 

###########################################################################################

start_year = 2018

start_month = 3

end_year = 2024

end_month = 06

#symbols = ['AAPL','GOOG']

symbols = None

header = True # Put headers in files True or False

individual = 0 # save into individual files with date ranges in filenames

header_in_individual = 0 # Add a header in each individual file if individual files selected?

dataset = 'minute_bars'

###########################################################################################

import liberator, time

back_to_year = start_year

back_to_month = start_month

old_month = None

while (back_to_year < end_year) or (back_to_year == end_year and back_to_month <= end_month):

    if back_to_month == 12:

        as_of_month = 1

        as_of_year = back_to_year +1

    else:

        as_of_month = back_to_month+1

        as_of_year = back_to_year

    back_to = f"{back_to_year:04d}-{back_to_month:02d}-01"

    as_of = f"{as_of_year:04d}-{as_of_month:02d}-01"

    print(f"Downloading {dataset} {back_to} to {as_of}", end='')

    start_time = time.time()

    head = header if (old_month==None) else False

    for batch in liberator.query(name=dataset, as_of = as_of, back_to = back_to, symbols = symbols):

        if not len(batch):

            continue

        if individual:

            batch.to_pandas().to_csv(dataset+'_'+back_to+'_'+as_of+'.csv', mode='a', header=header_in_individual or head) # if header_in_individual is zero then this will only be 1 for the first pass, when head = 1, from then on head = 0 as well

        else:

            batch.to_pandas().to_csv(dataset+'_all.csv', mode='a', header=head)

    print(" Query + save took", (time.time() - start_time)/60.0, " minutes to run. head:",head, end='\n')

    back_to_year = as_of_year

    back_to_month = as_of_month

    old_month = as_of_month