Skip to main content

Queries & Working with Large Datasets

The CloudQuant Data Liberator service frequently returns extensive result sets, sometimes reaching millions of rows. These queries can be time-consuming to execute. Since data frequencies vary across datasets, it is recommended to start with narrow timeframes for specific symbols before expanding scope.
Consider running a point-in-time query for one symbol to get an idea of how large your dataset is prior to running other queries.

Query Types

Most dataset queries fall into two categories:
  • Point-in-time queries
  • Time series queries
Familiarity with both query types early on will improve your API usage efficiency.

Point-In-Time Queries

Omit the back_to parameter to receive single point-in-time data for each specified symbol based on the as_of date/time. If you also exclude as_of, CloudQuant Data Liberator defaults to the current date/time.
df = liberator.get_dataframe(
    liberator.query(
        name='daily_bars',
        as_of='2020-11-15',
        symbols=['AAPL']
    )
)

Time Series Queries

Include a back_to parameter that precedes your as_of value. When as_of is excluded, it defaults to the present moment.
df = liberator.get_dataframe(
    liberator.query(
        name='daily_bars',
        as_of='2020-11-15',
        back_to='2018-11-15',
        symbols=['FB', 'AAPL', 'NFLX', 'GOOG', 'MSFT', 'IBM']
    )
)
When using back_to or as_of parameters, the time component is always used even if you do not specify it. Therefore, if you say as_of: "2023-01-15", you are actually saying as_of: "2023-01-15 00:00:00". This may affect result precision depending on your data requirements.