Supported data formats
CloudQuant Data Liberator ingests time-series and tabular data through connections (where data lives) and datasets (how files or tables are interpreted). The tables below list every file extension and data source category the platform understands. Formats are selected automatically from yourfile_pattern, file extension, or connection type when you onboard through the Liberator UI. You rarely need to set anything manually.
This page is the canonical format reference. Each connection guide links here for the formats available on that storage type.
Supported in version 2.0
These formats were available in the Liberator 2.0 release and remain supported on current versions.Delimited text files
| Extension | Delimiter | Configuration |
|---|---|---|
.csv | Comma (default) | file_pattern such as *.csv |
.tsv | Tab | file_pattern such as *.tsv |
.txt | Auto-detected | Same pipeline as CSV; delimiter inferred when omitted |
sep_override in data_args to force comma, tab, pipe (\|), or semicolon when auto-detection is not sufficient.
Columnar and binary files
| Extension | Description |
|---|---|
.parquet | Apache Parquet; columnar reads with type pushdown |
.arrow, .feather | Apache Arrow IPC (Feather v2) |
Structured office and scientific files
| Extension | Description | Extra configuration |
|---|---|---|
.xlsx, .xls | Microsoft Excel workbooks | First sheet sampled at onboarding |
.xml | XML documents | xml_args for element paths (set during onboarding) |
.h5, .hdf5 | HDF5 scientific arrays | h5py_groups for dataset path inside the file |
Archives
| Extension | Description |
|---|---|
.zip | ZIP archive; Liberator inspects the inner file and applies the matching format delegate |
Database sources
| Connection type | Engine | Access pattern |
|---|---|---|
| PostgreSQL | Native high-performance driver | Table or view per dataset |
| MySQL | ODBC (MariaDB-compatible) | Table or view per dataset |
| SQL Server | ODBC Driver 18 | Table or view per dataset |
| Oracle | Thin driver | Table or view per dataset |
| Snowflake | Native driver | Table or view per dataset |
API-backed datasets
REST endpoints that return JSON tabular payloads can be onboarded as API connections. The platform normalizes responses into the same query surface as file- and database-backed datasets.Schema column types
Regardless of source format, dataset schemas use these column types:string, int64, uint64, double, float, bool, date32, date64, time64
See Datasource configuration overview for column groups (key, time, value, meta).
Formats added after version 2.0
The following ingest formats were added in subsequent releases. They are available on current CloudQuant-managed environments at those versions and later.Version 2.1 — PSV (pipe-separated values)
| Extension | Delimiter | Notes |
|---|---|---|
.psv | Pipe (|) | First-class extension alongside CSV and Parquet |
.csv or .txt files by setting sep_override to "\|" without renaming the file.
Version 2.2 — PCAP / PCAPng (FIX tick capture)
| Extension | Description |
|---|---|
.pcap | Classic packet capture |
.pcapng | Next-generation packet capture |
35 (MsgType), 49 (SenderCompID), 52 (SendingTime), and 55 (Symbol), plus _pcap_ts_ns for the capture timestamp.
Use PCAP datasets when you capture exchange feeds at the wire level and want the same query model as historical bar or trade datasets.
Added in version 2.3
No new file or data ingest formats are part of the 2.3 release at this time. See What’s New in Liberator 2.2 for PCAP/PCAPng and What’s New in Liberator 2.1 for PSV.Where each format applies
All file-based formats in the tables above can be stored on any file-backed connection type:| Connection | Guide |
|---|---|
| Local / mounted directory | Local File |
| Amazon S3 (and S3-compatible) | S3 |
| Azure Blob Storage | Azure Blob |
| SFTP | SFTP |
| FTPS | FTPS |
| CIFS / SMB | CIFS |
| Connection | Guide |
|---|---|
| PostgreSQL | PostgreSQL |
| MySQL | MySQL |
| SQL Server | SQL Server |
| Oracle | Oracle |
| Snowflake | Snowflake |
Choosing a format
| Use case | Recommended format |
|---|---|
| Human-readable exports from spreadsheets or ETL jobs | CSV or TSV |
| Vendor pipe-delimited daily drops | PSV (2.1+) or CSV with sep_override |
| Large historical archives, column pruning | Parquet |
| Low-latency interchange between Arrow-native tools | Arrow IPC / Feather |
| Excel exports from business users | XLSX |
| Scientific simulation output | HDF5 |
| Hierarchical vendor XML feeds | XML |
| SQL warehouse tables already in your estate | Matching database connection |
| Wire-level FIX tick replay | PCAP / PCAPng (2.2+) |
Related
Datasource overview
Connection + dataset architecture and shared
data_args fieldsLocal file reference
Full
data_args reference for file-based sources
