Python Queries & Working with Large Data Sets

The Liberator service can often return very large result sets, often measured as millions of rows. These queries may take a long time to run. Every dataset has different data frequencies. We suggest using short time frames for limited symbols until you become familiar with the datasets, some of our datasets are very large indeed and thus an over-extended request may result in an extremely large amount of data returned.

💡 Best Practice: Consider running a point-in-time query for one symbol to get an idea of how large your dataset is prior to running other queries.

In the vast majority of cases, dataset queries are either for a point in time query or for a time series.

As such, you should aim to become as familiar as you can with these queries early on in your use of the API.

Typical Query

This is how a typical CloudQuant query looks in Python, we have also included the conversion to a dataframe as this will be the most common usage :

df = liberator.get_dataframe( liberator.query( name = 'daily_bars', as_of = '2020-11-15',
       back_to = ‘2018-11-15’, symbols = ['FB', 'AAPL', 'NFLX', 'GOOG', 'MSFT', 'IBM']))

Point-In-Time queries

To specify a point in time query simply drop the “back_to” parameter and you will receive single point in time data based on the as_of date/time for each symbol specified.

If you also drop the “as_of” Liberator will assume you mean current date/time and will fetch the most recent point in time data.

Time Series queries

To specify a time series query, add a “back_to” parameter to your query.

This back_to must be a timestamp that is prior to your “as_of” parameter.

If you exclude the “as_of” parameter, it will default to now - the current point in time.