Skip to main content

Datasource Configuration

CloudQuant Data Liberator supports a wide range of datasource types for ingesting time series data. Each datasource requires a connection (how to reach the data) and a dataset (what data to extract and how to interpret it).

Supported Datasource Types

File-Based Sources

TypeDescription
Local File (CSV/TSV)Flat files on local/mounted storage
S3Amazon S3 or S3-compatible object storage
Azure Blob StorageMicrosoft Azure Blob containers
SFTPSSH File Transfer Protocol servers
FTPSFTP over TLS/SSL
CIFS/SMBWindows/Samba network file shares

Database Sources

TypeDescription
PostgreSQLHigh-performance native driver
MySQLVia ODBC driver (MySQL-compatible)
SQL ServerVia ODBC driver (ODBC Driver 18)
OracleVia Oracle database driver (thin mode)
SnowflakeHigh-performance native driver

Additional Supported File Types

CloudQuant Data Liberator also supports Arrow IPC (binary columnar), Parquet, Excel (.xlsx), XML, and HDF5 files. These are configured the same way as other file-based sources through the CloudQuant Data Liberator UI.

Architecture: Connection + Dataset

Every datasource in CloudQuant Data Liberator is composed of two parts:

Connection

Defines how to reach the data — credentials, endpoints, paths, and transport protocol.
Connection → "Where is the data and how do I authenticate?"

Dataset

Defines what to extract — which table/files, timestamp columns, key columns, schema, and data frequency.
Dataset → "What data do I want and how do I interpret it?"

Common Configuration Concepts

Timestamp Configuration

All datasources require timestamp configuration to map source data into CloudQuant Data Liberator’s microsecond timestamp (muts) format:
FieldDescription
data_dt_columnColumn(s) containing the datetime
data_dt_formatFormat string or parsing specification
data_dt_timezoneTimezone of the source data (e.g., "UTC", "America/New_York")
data_dt_nudgeMicrosecond offset applied to timestamps

Supported DateTime Formats

FormatDescription
"%Y-%m-%d %H:%M:%S"Standard strptime format
"datetime"Native database datetime column
"date"Native date column (date32/date64)
"muts"Unix epoch microseconds
"uts"Unix epoch seconds
"nuts"Unix epoch nanoseconds
trueAuto-detect native datetime (database sources)

Key Column Configuration

The data_key_column field defines the symbol/key used for filtering queries:
# Simple string
"data_key_column": "symbol"

# Composite key with literals and columns
"data_key_column": [
    {"type": "column", "value": "exchange"},
    {"type": "literal", "value": "_"},
    {"type": "column", "value": "ticker"}
]

Schema Definition

Each column in a dataset schema requires:
{
    "name": "column_name",
    "type": "int64",
    "group": "value",
    "description": "Human-readable description",
    "display_name": "Display Name"
}
Column types: string, int64, uint64, double, float, bool, date32, date64, time64 Column groups:
  • key — Symbol/key columns
  • time — Timestamp columns
  • value — Data columns
  • meta — System columns (_seq, muts, etc.)

Auto-Generated Columns

CloudQuant Data Liberator automatically generates these columns if not present in source data:
ColumnTypeDescription
_sequint64Sequential row number within partition
mutsint64Microseconds since Unix epoch
timestampstringHuman-readable timestamp (America/New_York)
symbolstringKey column (copied from data_key_column)

File Name Date Extraction

For file-based sources, dates can be extracted from filenames:
FieldDescriptionExample
fname_dt_regexRegex to match date portion of filenamedata_(\d{4}-\d{2}-\d{2})\.csv
fname_dt_formatstrptime format for the matched portion%Y-%m-%d
fname_dt_timezoneTimezone of the filename dateUTC
fname_dt_nudgeMicrosecond offset0
fname_dt_approx_secondsApproximate seconds per file86400