> ## Documentation Index
> Fetch the complete documentation index at: https://knowledge.cloudquant.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Supported Data Formats

> File and data formats CloudQuant Data Liberator can ingest, by platform version

# Supported data formats

CloudQuant Data Liberator ingests time-series and tabular data through **connections** (where data lives) and **datasets** (how files or tables are interpreted). The tables below list every **file extension** and **data source category** the platform understands.

Formats are selected automatically from your `file_pattern`, file extension, or connection type when you onboard through the Liberator UI. You rarely need to set anything manually.

<Note>
  This page is the canonical format reference. Each [connection guide](/datasource-config/overview) links here for the formats available on that storage type.
</Note>

## Supported in version 2.0

These formats were available in the Liberator **2.0** release and remain supported on current versions.

### Delimited text files

| Extension | Delimiter       | Configuration                                         |
| --------- | --------------- | ----------------------------------------------------- |
| `.csv`    | Comma (default) | `file_pattern` such as `*.csv`                        |
| `.tsv`    | Tab             | `file_pattern` such as `*.tsv`                        |
| `.txt`    | Auto-detected   | Same pipeline as CSV; delimiter inferred when omitted |

Set `sep_override` in `data_args` to force comma, tab, pipe (`\|`), or semicolon when auto-detection is not sufficient.

### Columnar and binary files

| Extension            | Description                                       |
| -------------------- | ------------------------------------------------- |
| `.parquet`           | Apache Parquet; columnar reads with type pushdown |
| `.arrow`, `.feather` | Apache Arrow IPC (Feather v2)                     |

On S3, Parquet datasets can use **passthrough** mode (direct read without intermediate caching). See the [S3 Parquet example](/datasource-config/s3#parquet-dataset).

### Structured office and scientific files

| Extension       | Description               | Extra configuration                                  |
| --------------- | ------------------------- | ---------------------------------------------------- |
| `.xlsx`, `.xls` | Microsoft Excel workbooks | First sheet sampled at onboarding                    |
| `.xml`          | XML documents             | `xml_args` for element paths (set during onboarding) |
| `.h5`, `.hdf5`  | HDF5 scientific arrays    | `h5py_groups` for dataset path inside the file       |

### Archives

| Extension | Description                                                                             |
| --------- | --------------------------------------------------------------------------------------- |
| `.zip`    | ZIP archive; Liberator inspects the inner file and applies the matching format delegate |

### Database sources

| Connection type | Engine                         | Access pattern            |
| --------------- | ------------------------------ | ------------------------- |
| PostgreSQL      | Native high-performance driver | Table or view per dataset |
| MySQL           | ODBC (MariaDB-compatible)      | Table or view per dataset |
| SQL Server      | ODBC Driver 18                 | Table or view per dataset |
| Oracle          | Thin driver                    | Table or view per dataset |
| Snowflake       | Native driver                  | Table or view per dataset |

### API-backed datasets

REST endpoints that return JSON tabular payloads can be onboarded as **API** connections. The platform normalizes responses into the same query surface as file- and database-backed datasets.

### Schema column types

Regardless of source format, dataset schemas use these column types:

`string`, `int64`, `uint64`, `double`, `float`, `bool`, `date32`, `date64`, `time64`

See [Datasource configuration overview](/datasource-config/overview#schema-definition) for column groups (`key`, `time`, `value`, `meta`).

## Formats added after version 2.0

The following ingest formats were added in subsequent releases. They are available on current CloudQuant-managed environments at those versions and later.

### Version 2.1 — PSV (pipe-separated values)

| Extension | Delimiter   | Notes                                           |
| --------- | ----------- | ----------------------------------------------- |
| `.psv`    | Pipe (`\|`) | First-class extension alongside CSV and Parquet |

PSV files use the same onboarding flow as CSV: header row, per-column type inference, and configurable null sentinel. You can also ingest pipe-delimited `.csv` or `.txt` files by setting `sep_override` to `"\|"` without renaming the file.

### Version 2.2 — PCAP / PCAPng (FIX tick capture)

| Extension | Description                    |
| --------- | ------------------------------ |
| `.pcap`   | Classic packet capture         |
| `.pcapng` | Next-generation packet capture |

Liberator extracts **FIX-protocol messages** from TCP payloads in packet captures and exposes them through the standard query API. Typical columns include FIX tags such as `35` (MsgType), `49` (SenderCompID), `52` (SendingTime), and `55` (Symbol), plus `_pcap_ts_ns` for the capture timestamp.

Use PCAP datasets when you capture exchange feeds at the wire level and want the same query model as historical bar or trade datasets.

## Added in version 2.3

No new file or data ingest formats are part of the 2.3 release at this time. See [What's New in Liberator 2.2](/whats-new/liberator-2.2) for PCAP/PCAPng and [What's New in Liberator 2.1](/whats-new/liberator-2.1) for PSV.

## Where each format applies

All **file-based** formats in the tables above can be stored on any file-backed connection type:

| Connection                    | Guide                                       |
| ----------------------------- | ------------------------------------------- |
| Local / mounted directory     | [Local File](/datasource-config/local-file) |
| Amazon S3 (and S3-compatible) | [S3](/datasource-config/s3)                 |
| Azure Blob Storage            | [Azure Blob](/datasource-config/azure-blob) |
| SFTP                          | [SFTP](/datasource-config/sftp)             |
| FTPS                          | [FTPS](/datasource-config/ftps)             |
| CIFS / SMB                    | [CIFS](/datasource-config/cifs)             |

**Database** and **API** formats map to their respective connection guides:

| Connection | Guide                                       |
| ---------- | ------------------------------------------- |
| PostgreSQL | [PostgreSQL](/datasource-config/postgresql) |
| MySQL      | [MySQL](/datasource-config/mysql)           |
| SQL Server | [SQL Server](/datasource-config/mssql)      |
| Oracle     | [Oracle](/datasource-config/oracle)         |
| Snowflake  | [Snowflake](/datasource-config/snowflake)   |

## Choosing a format

| Use case                                             | Recommended format                    |
| ---------------------------------------------------- | ------------------------------------- |
| Human-readable exports from spreadsheets or ETL jobs | CSV or TSV                            |
| Vendor pipe-delimited daily drops                    | PSV (2.1+) or CSV with `sep_override` |
| Large historical archives, column pruning            | Parquet                               |
| Low-latency interchange between Arrow-native tools   | Arrow IPC / Feather                   |
| Excel exports from business users                    | XLSX                                  |
| Scientific simulation output                         | HDF5                                  |
| Hierarchical vendor XML feeds                        | XML                                   |
| SQL warehouse tables already in your estate          | Matching database connection          |
| Wire-level FIX tick replay                           | PCAP / PCAPng (2.2+)                  |

## Related

<CardGroup cols={2}>
  <Card title="Datasource overview" icon="database" href="/datasource-config/overview">
    Connection + dataset architecture and shared `data_args` fields
  </Card>

  <Card title="Local file reference" icon="file" href="/datasource-config/local-file">
    Full `data_args` reference for file-based sources
  </Card>
</CardGroup>
