Supported data formats

CloudQuant Data Liberator ingests time-series and tabular data through connections (where data lives) and datasets (how files or tables are interpreted). The tables below list every file extension and data source category the platform understands. Formats are selected automatically from your file_pattern, file extension, or connection type when you onboard through the Liberator UI. You rarely need to set anything manually.

This page is the canonical format reference. Each connection guide links here for the formats available on that storage type.

Supported in version 2.0

These formats were available in the Liberator 2.0 release and remain supported on current versions.

Delimited text files

Extension	Delimiter	Configuration
`.csv`	Comma (default)	`file_pattern` such as `*.csv`
`.tsv`	Tab	`file_pattern` such as `*.tsv`
`.txt`	Auto-detected	Same pipeline as CSV; delimiter inferred when omitted

Set sep_override in data_args to force comma, tab, pipe (\|), or semicolon when auto-detection is not sufficient.

Columnar and binary files

Extension	Description
`.parquet`	Apache Parquet; columnar reads with type pushdown
`.arrow`, `.feather`	Apache Arrow IPC (Feather v2)

On S3, Parquet datasets can use passthrough mode (direct read without intermediate caching). See the S3 Parquet example.

Structured office and scientific files

Extension	Description	Extra configuration
`.xlsx`, `.xls`	Microsoft Excel workbooks	First sheet sampled at onboarding
`.xml`	XML documents	`xml_args` for element paths (set during onboarding)
`.h5`, `.hdf5`	HDF5 scientific arrays	`h5py_groups` for dataset path inside the file

Extension	Description
`.zip`	ZIP archive; Liberator inspects the inner file and applies the matching format delegate

Database sources

Connection type	Engine	Access pattern
PostgreSQL	Native high-performance driver	Table or view per dataset
MySQL	ODBC (MariaDB-compatible)	Table or view per dataset
SQL Server	ODBC Driver 18	Table or view per dataset
Oracle	Thin driver	Table or view per dataset
Snowflake	Native driver	Table or view per dataset

API-backed datasets

REST endpoints that return JSON tabular payloads can be onboarded as API connections. The platform normalizes responses into the same query surface as file- and database-backed datasets.

Schema column types

Regardless of source format, dataset schemas use these column types: string, int64, uint64, double, float, bool, date32, date64, time64 See Datasource configuration overview for column groups (key, time, value, meta).

Formats added after version 2.0

The following ingest formats were added in subsequent releases. They are available on current CloudQuant-managed environments at those versions and later.

Version 2.1 — PSV (pipe-separated values)

Extension	Delimiter	Notes
`.psv`	Pipe (`\|`)	First-class extension alongside CSV and Parquet

PSV files use the same onboarding flow as CSV: header row, per-column type inference, and configurable null sentinel. You can also ingest pipe-delimited .csv or .txt files by setting sep_override to "\|" without renaming the file.

Version 2.2 — PCAP / PCAPng (FIX tick capture)

Extension	Description
`.pcap`	Classic packet capture
`.pcapng`	Next-generation packet capture

Liberator extracts FIX-protocol messages from TCP payloads in packet captures and exposes them through the standard query API. Typical columns include FIX tags such as 35 (MsgType), 49 (SenderCompID), 52 (SendingTime), and 55 (Symbol), plus _pcap_ts_ns for the capture timestamp. Use PCAP datasets when you capture exchange feeds at the wire level and want the same query model as historical bar or trade datasets.

Added in version 2.3

No new file or data ingest formats are part of the 2.3 release at this time. See What’s New in Liberator 2.2 for PCAP/PCAPng and What’s New in Liberator 2.1 for PSV.

Where each format applies

All file-based formats in the tables above can be stored on any file-backed connection type:

Connection	Guide
Local / mounted directory	Local File
Amazon S3 (and S3-compatible)	S3
Azure Blob Storage	Azure Blob
SFTP	SFTP
FTPS	FTPS
CIFS / SMB	CIFS

Database and API formats map to their respective connection guides:

Connection	Guide
PostgreSQL	PostgreSQL
MySQL	MySQL
SQL Server	SQL Server
Oracle	Oracle
Snowflake	Snowflake

Choosing a format

Use case	Recommended format
Human-readable exports from spreadsheets or ETL jobs	CSV or TSV
Vendor pipe-delimited daily drops	PSV (2.1+) or CSV with `sep_override`
Large historical archives, column pruning	Parquet
Low-latency interchange between Arrow-native tools	Arrow IPC / Feather
Excel exports from business users	XLSX
Scientific simulation output	HDF5
Hierarchical vendor XML feeds	XML
SQL warehouse tables already in your estate	Matching database connection
Wire-level FIX tick replay	PCAP / PCAPng (2.2+)

Datasource overview

Connection + dataset architecture and shared data_args fields

Local file reference

Full data_args reference for file-based sources

Reference

Supported Data Formats

Supported data formats

Supported in version 2.0

Delimited text files

Columnar and binary files

Structured office and scientific files

Archives

Database sources

API-backed datasets

Schema column types

Formats added after version 2.0

Version 2.1 — PSV (pipe-separated values)

Version 2.2 — PCAP / PCAPng (FIX tick capture)

Added in version 2.3

Where each format applies

Choosing a format

Datasource overview

Local file reference

​Supported data formats

​Supported in version 2.0

​Delimited text files

​Columnar and binary files

​Structured office and scientific files

​Archives

​Database sources

​API-backed datasets

​Schema column types

​Formats added after version 2.0

​Version 2.1 — PSV (pipe-separated values)

​Version 2.2 — PCAP / PCAPng (FIX tick capture)

​Added in version 2.3

​Where each format applies

​Choosing a format

​Related

Datasource overview

Local file reference

Supported data formats

Supported in version 2.0

Delimited text files

Columnar and binary files

Structured office and scientific files

Archives

Database sources

API-backed datasets

Schema column types

Formats added after version 2.0

Version 2.1 — PSV (pipe-separated values)

Version 2.2 — PCAP / PCAPng (FIX tick capture)

Added in version 2.3

Where each format applies

Choosing a format

Related