Datasource configuration

CloudQuant Data Liberator supports a wide range of datasource types for ingesting time series data. Each datasource requires a connection (how to reach the data) and a dataset (what data to extract and how to interpret it).

See Supported Data Formats for the canonical list of file extensions and data source categories Liberator can ingest, including formats added in 2.1 and 2.2.

Supported datasource types

File-based sources

Type	Description
Local File (CSV/TSV)	Flat files on local/mounted storage
S3	Amazon S3 or S3-compatible object storage
Azure Blob Storage	Microsoft Azure Blob containers
SharePoint / OneDrive	Microsoft 365 file storage via share link (2.2+)
SFTP	SSH File Transfer Protocol servers
FTPS	FTP over TLS/SSL
CIFS/SMB	Windows/Samba network file shares

Database sources

Type	Description
PostgreSQL	High-performance native driver
MySQL	Via ODBC driver (MySQL-compatible)
SQL Server	Via ODBC driver (ODBC Driver 18)
Oracle	Via Oracle database driver (thin mode)
Snowflake	High-performance native driver

File and data formats

Beyond the delimited text examples in each connection guide, Liberator also ingests Parquet, Arrow IPC, Excel, XML, HDF5, ZIP archives, database tables, and API payloads. PSV (2.1+) and PCAP/PCAPng FIX capture (2.2+) are documented in Supported Data Formats.

Architecture: connection + dataset

Every datasource in CloudQuant Data Liberator is composed of two parts:

Connection

Defines how to reach the data — credentials, endpoints, paths, and transport protocol.

Connection → "Where is the data and how do I authenticate?"

Dataset

Defines what to extract — which table/files, timestamp columns, key columns, schema, and data frequency.

Dataset → "What data do I want and how do I interpret it?"

Common configuration concepts

Timestamp configuration

All datasources require timestamp configuration to map source data into CloudQuant Data Liberator’s microsecond timestamp (muts) format:

Field	Description
`data_dt_column`	Column(s) containing the datetime
`data_dt_format`	Format string or parsing specification
`data_dt_timezone`	Timezone of the source data (e.g., `"UTC"`, `"America/New_York"`)
`data_dt_nudge`	Microsecond offset applied to timestamps

Supported datetime formats

Format	Description
`"%Y-%m-%d %H:%M:%S"`	Standard strptime format
`"datetime"`	Native database datetime column
`"date"`	Native date column (date32/date64)
`"muts"`	Unix epoch microseconds
`"uts"`	Unix epoch seconds
`"nuts"`	Unix epoch nanoseconds
`true`	Auto-detect native datetime (database sources)

Key column configuration

The data_key_column field defines the symbol/key used for filtering queries:

# Simple string
"data_key_column": "symbol"

# Composite key with literals and columns
"data_key_column": [
    {"type": "column", "value": "exchange"},
    {"type": "literal", "value": "_"},
    {"type": "column", "value": "ticker"}
]

Schema definition

Each column in a dataset schema requires:

{
    "name": "column_name",
    "type": "int64",
    "group": "value",
    "description": "Human-readable description",
    "display_name": "Display Name"
}

Column types: string, int64, uint64, double, float, bool, date32, date64, time64 Column groups:

key — Symbol/key columns
time — Timestamp columns
value — Data columns
meta — System columns (_seq, muts, etc.)

Auto-generated columns

CloudQuant Data Liberator automatically generates these columns if not present in source data:

Column	Type	Description
`_seq`	uint64	Sequential row number within partition
`muts`	int64	Microseconds since Unix epoch
`timestamp`	string	Human-readable timestamp (America/New_York)
`symbol`	string	Key column (copied from `data_key_column`)

File name date extraction

For file-based sources, dates can be extracted from filenames:

Field	Description	Example
`fname_dt_regex`	Regex to match date portion of filename	`data_(\d{4}-\d{2}-\d{2})\.csv`
`fname_dt_format`	strptime format for the matched portion	`%Y-%m-%d`
`fname_dt_timezone`	Timezone of the filename date	`UTC`
`fname_dt_nudge`	Microsecond offset	`0`
`fname_dt_approx_seconds`	Approximate seconds per file	`86400`

​Datasource configuration

​Supported datasource types

​File-based sources

​Database sources

​File and data formats

​Architecture: connection + dataset

​Connection

​Dataset

​Common configuration concepts

​Timestamp configuration

​Supported datetime formats

​Key column configuration

​Schema definition

​Auto-generated columns

​File name date extraction

Datasource configuration

Supported datasource types

File-based sources

Database sources

File and data formats

Architecture: connection + dataset

Connection

Dataset

Common configuration concepts

Timestamp configuration

Supported datetime formats

Key column configuration

Schema definition

Auto-generated columns

File name date extraction