> ## Documentation Index
> Fetch the complete documentation index at: https://knowledge.cloudquant.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Datasource Configuration Overview

> Guide to configuring datasource connections in CloudQuant Data Liberator

# Datasource configuration

CloudQuant Data Liberator supports a wide range of datasource types for ingesting time series data. Each datasource requires a **connection** (how to reach the data) and a **dataset** (what data to extract and how to interpret it).

<Note>
  See [Supported Data Formats](/datasource-config/supported-formats) for the canonical list of file extensions and data source categories Liberator can ingest, including formats added in 2.1 and 2.2.
</Note>

## Supported datasource types

### File-based sources

| Type                                                            | Description                                      |
| --------------------------------------------------------------- | ------------------------------------------------ |
| [Local File (CSV/TSV)](/datasource-config/local-file)           | Flat files on local/mounted storage              |
| [S3](/datasource-config/s3)                                     | Amazon S3 or S3-compatible object storage        |
| [Azure Blob Storage](/datasource-config/azure-blob)             | Microsoft Azure Blob containers                  |
| [SharePoint / OneDrive](/datasource-config/sharepoint-onedrive) | Microsoft 365 file storage via share link (2.2+) |
| [SFTP](/datasource-config/sftp)                                 | SSH File Transfer Protocol servers               |
| [FTPS](/datasource-config/ftps)                                 | FTP over TLS/SSL                                 |
| [CIFS/SMB](/datasource-config/cifs)                             | Windows/Samba network file shares                |

### Database sources

| Type                                        | Description                            |
| ------------------------------------------- | -------------------------------------- |
| [PostgreSQL](/datasource-config/postgresql) | High-performance native driver         |
| [MySQL](/datasource-config/mysql)           | Via ODBC driver (MySQL-compatible)     |
| [SQL Server](/datasource-config/mssql)      | Via ODBC driver (ODBC Driver 18)       |
| [Oracle](/datasource-config/oracle)         | Via Oracle database driver (thin mode) |
| [Snowflake](/datasource-config/snowflake)   | High-performance native driver         |

### File and data formats

Beyond the delimited text examples in each connection guide, Liberator also ingests Parquet, Arrow IPC, Excel, XML, HDF5, ZIP archives, database tables, and API payloads. PSV (2.1+) and PCAP/PCAPng FIX capture (2.2+) are documented in [Supported Data Formats](/datasource-config/supported-formats).

## Architecture: connection + dataset

Every datasource in CloudQuant Data Liberator is composed of two parts:

### Connection

Defines **how to reach** the data — credentials, endpoints, paths, and transport protocol.

```
Connection → "Where is the data and how do I authenticate?"
```

### Dataset

Defines **what to extract** — which table/files, timestamp columns, key columns, schema, and data frequency.

```
Dataset → "What data do I want and how do I interpret it?"
```

## Common configuration concepts

### Timestamp configuration

All datasources require timestamp configuration to map source data into CloudQuant Data Liberator's microsecond timestamp (`muts`) format:

| Field              | Description                                                       |
| ------------------ | ----------------------------------------------------------------- |
| `data_dt_column`   | Column(s) containing the datetime                                 |
| `data_dt_format`   | Format string or parsing specification                            |
| `data_dt_timezone` | Timezone of the source data (e.g., `"UTC"`, `"America/New_York"`) |
| `data_dt_nudge`    | Microsecond offset applied to timestamps                          |

#### Supported datetime formats

| Format                | Description                                    |
| --------------------- | ---------------------------------------------- |
| `"%Y-%m-%d %H:%M:%S"` | Standard strptime format                       |
| `"datetime"`          | Native database datetime column                |
| `"date"`              | Native date column (date32/date64)             |
| `"muts"`              | Unix epoch microseconds                        |
| `"uts"`               | Unix epoch seconds                             |
| `"nuts"`              | Unix epoch nanoseconds                         |
| `true`                | Auto-detect native datetime (database sources) |

### Key column configuration

The `data_key_column` field defines the symbol/key used for filtering queries:

```python theme={null}
# Simple string
"data_key_column": "symbol"

# Composite key with literals and columns
"data_key_column": [
    {"type": "column", "value": "exchange"},
    {"type": "literal", "value": "_"},
    {"type": "column", "value": "ticker"}
]
```

### Schema definition

Each column in a dataset schema requires:

```json theme={null}
{
    "name": "column_name",
    "type": "int64",
    "group": "value",
    "description": "Human-readable description",
    "display_name": "Display Name"
}
```

**Column types:** `string`, `int64`, `uint64`, `double`, `float`, `bool`, `date32`, `date64`, `time64`

**Column groups:**

* `key` — Symbol/key columns
* `time` — Timestamp columns
* `value` — Data columns
* `meta` — System columns (`_seq`, `muts`, etc.)

### Auto-generated columns

CloudQuant Data Liberator automatically generates these columns if not present in source data:

| Column      | Type   | Description                                  |
| ----------- | ------ | -------------------------------------------- |
| `_seq`      | uint64 | Sequential row number within partition       |
| `muts`      | int64  | Microseconds since Unix epoch                |
| `timestamp` | string | Human-readable timestamp (America/New\_York) |
| `symbol`    | string | Key column (copied from `data_key_column`)   |

### File name date extraction

For file-based sources, dates can be extracted from filenames:

| Field                     | Description                             | Example                         |
| ------------------------- | --------------------------------------- | ------------------------------- |
| `fname_dt_regex`          | Regex to match date portion of filename | `data_(\d{4}-\d{2}-\d{2})\.csv` |
| `fname_dt_format`         | strptime format for the matched portion | `%Y-%m-%d`                      |
| `fname_dt_timezone`       | Timezone of the filename date           | `UTC`                           |
| `fname_dt_nudge`          | Microsecond offset                      | `0`                             |
| `fname_dt_approx_seconds` | Approximate seconds per file            | `86400`                         |
