Documentation Index
Fetch the complete documentation index at: https://knowledge.cloudquant.com/llms.txt
Use this file to discover all available pages before exploring further.
Datasource Configuration
CloudQuant Data Liberator supports a wide range of datasource types for ingesting time series data. Each datasource requires a connection (how to reach the data) and a dataset (what data to extract and how to interpret it).
Supported Datasource Types
File-Based Sources
| Type | Description |
|---|
| Local File (CSV/TSV) | Flat files on local/mounted storage |
| S3 | Amazon S3 or S3-compatible object storage |
| Azure Blob Storage | Microsoft Azure Blob containers |
| SFTP | SSH File Transfer Protocol servers |
| FTPS | FTP over TLS/SSL |
| CIFS/SMB | Windows/Samba network file shares |
Database Sources
| Type | Description |
|---|
| PostgreSQL | High-performance native driver |
| MySQL | Via ODBC driver (MySQL-compatible) |
| SQL Server | Via ODBC driver (ODBC Driver 18) |
| Oracle | Via Oracle database driver (thin mode) |
| Snowflake | High-performance native driver |
Additional Supported File Types
CloudQuant Data Liberator also supports Arrow IPC (binary columnar), Parquet, Excel (.xlsx), XML, and HDF5 files. These are configured the same way as other file-based sources through the CloudQuant Data Liberator UI.
Architecture: Connection + Dataset
Every datasource in CloudQuant Data Liberator is composed of two parts:
Connection
Defines how to reach the data — credentials, endpoints, paths, and transport protocol.
Connection → "Where is the data and how do I authenticate?"
Dataset
Defines what to extract — which table/files, timestamp columns, key columns, schema, and data frequency.
Dataset → "What data do I want and how do I interpret it?"
Common Configuration Concepts
Timestamp Configuration
All datasources require timestamp configuration to map source data into CloudQuant Data Liberator’s microsecond timestamp (muts) format:
| Field | Description |
|---|
data_dt_column | Column(s) containing the datetime |
data_dt_format | Format string or parsing specification |
data_dt_timezone | Timezone of the source data (e.g., "UTC", "America/New_York") |
data_dt_nudge | Microsecond offset applied to timestamps |
| Format | Description |
|---|
"%Y-%m-%d %H:%M:%S" | Standard strptime format |
"datetime" | Native database datetime column |
"date" | Native date column (date32/date64) |
"muts" | Unix epoch microseconds |
"uts" | Unix epoch seconds |
"nuts" | Unix epoch nanoseconds |
true | Auto-detect native datetime (database sources) |
Key Column Configuration
The data_key_column field defines the symbol/key used for filtering queries:
# Simple string
"data_key_column": "symbol"
# Composite key with literals and columns
"data_key_column": [
{"type": "column", "value": "exchange"},
{"type": "literal", "value": "_"},
{"type": "column", "value": "ticker"}
]
Schema Definition
Each column in a dataset schema requires:
{
"name": "column_name",
"type": "int64",
"group": "value",
"description": "Human-readable description",
"display_name": "Display Name"
}
Column types: string, int64, uint64, double, float, bool, date32, date64, time64
Column groups:
key — Symbol/key columns
time — Timestamp columns
value — Data columns
meta — System columns (_seq, muts, etc.)
Auto-Generated Columns
CloudQuant Data Liberator automatically generates these columns if not present in source data:
| Column | Type | Description |
|---|
_seq | uint64 | Sequential row number within partition |
muts | int64 | Microseconds since Unix epoch |
timestamp | string | Human-readable timestamp (America/New_York) |
symbol | string | Key column (copied from data_key_column) |
For file-based sources, dates can be extracted from filenames:
| Field | Description | Example |
|---|
fname_dt_regex | Regex to match date portion of filename | data_(\d{4}-\d{2}-\d{2})\.csv |
fname_dt_format | strptime format for the matched portion | %Y-%m-%d |
fname_dt_timezone | Timezone of the filename date | UTC |
fname_dt_nudge | Microsecond offset | 0 |
fname_dt_approx_seconds | Approximate seconds per file | 86400 |