Skip to main content

Supported data formats

CloudQuant Data Liberator ingests time-series and tabular data through connections (where data lives) and datasets (how files or tables are interpreted). The tables below list every file extension and data source category the platform understands. Formats are selected automatically from your file_pattern, file extension, or connection type when you onboard through the Liberator UI. You rarely need to set anything manually.
This page is the canonical format reference. Each connection guide links here for the formats available on that storage type.

Supported in version 2.0

These formats were available in the Liberator 2.0 release and remain supported on current versions.

Delimited text files

ExtensionDelimiterConfiguration
.csvComma (default)file_pattern such as *.csv
.tsvTabfile_pattern such as *.tsv
.txtAuto-detectedSame pipeline as CSV; delimiter inferred when omitted
Set sep_override in data_args to force comma, tab, pipe (\|), or semicolon when auto-detection is not sufficient.

Columnar and binary files

ExtensionDescription
.parquetApache Parquet; columnar reads with type pushdown
.arrow, .featherApache Arrow IPC (Feather v2)
On S3, Parquet datasets can use passthrough mode (direct read without intermediate caching). See the S3 Parquet example.

Structured office and scientific files

ExtensionDescriptionExtra configuration
.xlsx, .xlsMicrosoft Excel workbooksFirst sheet sampled at onboarding
.xmlXML documentsxml_args for element paths (set during onboarding)
.h5, .hdf5HDF5 scientific arraysh5py_groups for dataset path inside the file

Archives

ExtensionDescription
.zipZIP archive; Liberator inspects the inner file and applies the matching format delegate

Database sources

Connection typeEngineAccess pattern
PostgreSQLNative high-performance driverTable or view per dataset
MySQLODBC (MariaDB-compatible)Table or view per dataset
SQL ServerODBC Driver 18Table or view per dataset
OracleThin driverTable or view per dataset
SnowflakeNative driverTable or view per dataset

API-backed datasets

REST endpoints that return JSON tabular payloads can be onboarded as API connections. The platform normalizes responses into the same query surface as file- and database-backed datasets.

Schema column types

Regardless of source format, dataset schemas use these column types: string, int64, uint64, double, float, bool, date32, date64, time64 See Datasource configuration overview for column groups (key, time, value, meta).

Formats added after version 2.0

The following ingest formats were added in subsequent releases. They are available on current CloudQuant-managed environments at those versions and later.

Version 2.1 — PSV (pipe-separated values)

ExtensionDelimiterNotes
.psvPipe (|)First-class extension alongside CSV and Parquet
PSV files use the same onboarding flow as CSV: header row, per-column type inference, and configurable null sentinel. You can also ingest pipe-delimited .csv or .txt files by setting sep_override to "\|" without renaming the file.

Version 2.2 — PCAP / PCAPng (FIX tick capture)

ExtensionDescription
.pcapClassic packet capture
.pcapngNext-generation packet capture
Liberator extracts FIX-protocol messages from TCP payloads in packet captures and exposes them through the standard query API. Typical columns include FIX tags such as 35 (MsgType), 49 (SenderCompID), 52 (SendingTime), and 55 (Symbol), plus _pcap_ts_ns for the capture timestamp. Use PCAP datasets when you capture exchange feeds at the wire level and want the same query model as historical bar or trade datasets.

Added in version 2.3

No new file or data ingest formats are part of the 2.3 release at this time. See What’s New in Liberator 2.2 for PCAP/PCAPng and What’s New in Liberator 2.1 for PSV.

Where each format applies

All file-based formats in the tables above can be stored on any file-backed connection type:
ConnectionGuide
Local / mounted directoryLocal File
Amazon S3 (and S3-compatible)S3
Azure Blob StorageAzure Blob
SFTPSFTP
FTPSFTPS
CIFS / SMBCIFS
Database and API formats map to their respective connection guides:
ConnectionGuide
PostgreSQLPostgreSQL
MySQLMySQL
SQL ServerSQL Server
OracleOracle
SnowflakeSnowflake

Choosing a format

Use caseRecommended format
Human-readable exports from spreadsheets or ETL jobsCSV or TSV
Vendor pipe-delimited daily dropsPSV (2.1+) or CSV with sep_override
Large historical archives, column pruningParquet
Low-latency interchange between Arrow-native toolsArrow IPC / Feather
Excel exports from business usersXLSX
Scientific simulation outputHDF5
Hierarchical vendor XML feedsXML
SQL warehouse tables already in your estateMatching database connection
Wire-level FIX tick replayPCAP / PCAPng (2.2+)

Datasource overview

Connection + dataset architecture and shared data_args fields

Local file reference

Full data_args reference for file-based sources