Java: Queries & Working with Large Data Sets

The Liberator service can often return very large result sets, often measured as millions of rows. These queries may take a long time to run. Every dataset has different data frequencies. We suggest using short time frames for limited symbols until you become familiar with the datasets, some of our datasets are very large indeed and thus an over-extended request may result in an extremely large amount of data returned.

💡 Best Practice: Consider running a point-in-time query for one symbol to get an idea of how large your dataset is prior to running other queries.

In the vast majority of cases, dataset queries are either for a point in time query or for a time series.

As such, you should aim to become as familiar as you can with these queries early on in your use of the API.

Typical Query

This is how a typical CloudQuant query looks in Java. Liberator.query accepts a HashMap of query parameters and returns a unique ID. Iterating over Liberator.Generate will return query results. Each query result will contain the unique ID of the query and either a JSON value or an Apache Arrow RecordBatch:

Liberator liberator = new Liberator();
Object query_id = liberator.query(new HashMap<>() );



for (Liberator.QueryResult res : liberator.Generate()) {
    Object res_id = res.GetId();
    Object data = res.GetData();
    if(data instanceof org.apache.arrow.vector.VectorSchemaRoot){
        org.apache.arrow.vector.VectorSchemaRoot recordBatch = (org.apache.arrow.vector.VectorSchemaRoot) data;
        printRecordBatch(recordBatch);
    } else if (data instanceof javax.json.JsonValue) {
        javax.json.JsonValue jsonValue = (javax.json.JsonValue) data;
        System.out.println(jsonValue);
    } 
}

Point-In-Time Queries

To specify a point in time query simply drop the “back_to” parameter and you will receive single point in time data based on the as_of date/time for each symbol specified.

If you also drop the “as_of” Liberator will assume you mean current date/time and will fetch the most recent point in time data.

Time Series Queries

To specify a time series query, add a “back_to” parameter to your query.

This back_to must be a timestamp that is prior to your “as_of” parameter.

If you exclude the “as_of” parameter, it will default to now - the current point in time.

Parameters

Argument

Description

Type

symbols

The security trading symbol(s) you wish to query

String, or List

name

The name of the dataset (required)

String

as _of

The date in time that you wish the data to be. as_of defaults to now. This value can be any past date so that you can see the data as it was known on the “as of” date.

String

Format YYYY-MM-DD HH:MM:SS

(HH:MM:SS is optional)

back_to

The date where the return dataset should begin. This is reading all the data “back to” the specified date.

String

Format YYYY-MM-DD HH:MM:SS

(HH:MM:SS is optional)

url

Optional. The url of the liberator server that you are querying. This defaults to 'https://api.cloudquant.ai'.

String

system

The name of the authorized system that you are querying from. The security mechanism in liberator authorized users for individual systems.

String

This is almost always “API”

fields

Filter down the column names to just those required. You cannot remove mandatory fields

String, or List

crux_key

Key for querying for a Crux dataset

String (your Crux Key)

compress

The data compression method on the wire. CloudQuant uses compression.

String. Always compressed_transfer

json_xfer

Json transfer. This is handled for you by the API depending on the type of query

Boolean

user

The user identifier (as assigned by CloudQuant)

String

token

The user’s assigned token

String

SQL

SQL query for liberator instances that support SQL passthrough

String