Queries & Working with Large Datasets

Queries & working with large datasets

The CloudQuant Data Liberator service frequently returns extensive result sets, sometimes reaching millions of rows. These queries can be time-consuming to execute. Since data frequencies vary across datasets, it is recommended to start with narrow timeframes for specific symbols before expanding scope.

Consider running a point-in-time query for one symbol to get an idea of how large your dataset is prior to running other queries.

As of release version 2.0, the P12 client certificate (liberator.pfx) is no longer required. The --cert-type P12 --cert liberator.pfx flags shown in the curl examples below are only needed for releases prior to 2.0.

Point-in-time queries

Omit the back_to parameter to receive single point-in-time data for each specified symbol based on the as_of date/time. If you also exclude as_of, CloudQuant Data Liberator defaults to the current date/time.

df = liberator.get_dataframe(
    liberator.query(
        name='daily_bars',
        as_of='2020-11-15',
        symbols=['AAPL']
    )
)

Time series queries

Include a back_to parameter that precedes your as_of value. When as_of is excluded, it defaults to the present moment.

df = liberator.get_dataframe(
    liberator.query(
        name='daily_bars',
        as_of='2020-11-15',
        back_to='2018-11-15',
        symbols=['FB', 'AAPL', 'NFLX', 'GOOG', 'MSFT', 'IBM']
    )
)

When using back_to or as_of parameters, the time component is always used even if you do not specify it. Therefore, if you say as_of: "2023-01-15", you are actually saying as_of: "2023-01-15 00:00:00". This may affect result precision depending on your data requirements.

​Queries & working with large datasets

​Query types

​Point-in-time queries

​Time series queries

Query types

Point-in-time queries

Time series queries