Local file (CSV/TSV)

Local file datasources read CSV, TSV, or other delimited flat files from a directory on the CloudQuant Data Liberator server or a mounted filesystem. This is the simplest file-based connection type and serves as the foundation for understanding all other file-based sources.

See Supported Data Formats for every file extension Liberator can ingest on this connection, including Parquet, Arrow, Excel, XML, HDF5, PSV (2.1+), and PCAP (2.2+).

Connection configuration

Required fields

Field	Type	Description
`connection_type`	string	Must be `"file"`
`behavior`	string	Must be `"file"`
`location`	string	Absolute path to the directory containing data files

The location field should point to a directory, not an individual file. CloudQuant Data Liberator will scan the directory for files matching the file_pattern in data_args.

Example connection

{
  "name": "local-trades-connection",
  "connection_type": "file",
  "behavior": "file",
  "location": "/data/trades"
}

Dataset configuration (data_args)

All file-based datasources share the same data_args fields. These control how CloudQuant Data Liberator finds, parses, and interprets your files.

Required fields

Field	Type	Description
`file_pattern`	string	Glob pattern to match files, e.g., `".csv"`, `"prefix_.tsv"`
`data_dt_column`	string or list	Column(s) containing the datetime value
`data_dt_format`	string or list	strptime format string, or special values: `"muts"`, `"uts"`, `"nuts"`, `"datetime"`, `"date"`
`data_key_column`	string or list	Column(s) used as the symbol/key for query filtering

Optional fields

Field	Type	Default	Description
`sep_override`	string	`","`	Delimiter character: `","` (comma), `"\t"` (tab), `"\|"` (pipe), `";"` (semicolon)
`encoding`	string	`"utf-8"`	File encoding (e.g., `"utf-8"`, `"latin-1"`, `"ascii"`)
`data_dt_timezone`	string	`"UTC"`	Timezone of source data, e.g., `"UTC"`, `"America/New_York"`
`fname_dt_regex`	string		Regex to extract a date from the filename
`fname_dt_format`	string		strptime format for the date extracted by `fname_dt_regex`
`fname_dt_timezone`	string		Timezone of the filename-derived date
`fname_dt_nudge`	int	`0`	Microsecond offset applied to filename-derived dates
`fname_dt_approx_seconds`	int		Approximate number of seconds of data per file (used for query optimization)
`arrow_sort`	list	`["symbol", "muts"]`	Sort order for the resulting Arrow table
`arrow_timestamp`	bool	`true`	Whether to generate the human-readable `timestamp` column

Set fname_dt_approx_seconds to 86400 for daily files. This helps CloudQuant Data Liberator skip files outside the query’s time range, significantly improving performance for large directories.

Complete example

Below is a full configuration showing both the connection and a dataset for daily trade CSV files.

Connection

{
  "name": "local-daily-trades",
  "connection_type": "file",
  "behavior": "file",
  "location": "/data/daily-trades"
}

Dataset

{
  "name": "us-equity-trades",
  "connection": "local-daily-trades",
  "data_args": {
    "file_pattern": "trades_*.csv",
    "sep_override": ",",
    "encoding": "utf-8",
    "data_dt_column": "trade_time",
    "data_dt_format": "%Y-%m-%d %H:%M:%S",
    "data_dt_timezone": "America/New_York",
    "data_key_column": "symbol",
    "fname_dt_regex": "trades_(\\d{4}-\\d{2}-\\d{2})\\.csv",
    "fname_dt_format": "%Y-%m-%d",
    "fname_dt_timezone": "America/New_York",
    "fname_dt_approx_seconds": 86400,
    "arrow_sort": ["symbol", "muts"],
    "arrow_timestamp": true
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "trade_time", "type": "string", "group": "time", "description": "Trade timestamp" },
    { "name": "price", "type": "double", "group": "value", "description": "Trade price" },
    { "name": "volume", "type": "int64", "group": "value", "description": "Trade volume" }
  ]
}

Ensure the CloudQuant Data Liberator process has read permissions on the location directory and all files within it. Permission errors will cause silent failures during query execution.

Tab-separated files (TSV)

For TSV files, set sep_override to "\t":

{
  "data_args": {
    "file_pattern": "*.tsv",
    "sep_override": "\t",
    "data_dt_column": "date",
    "data_dt_format": "%Y%m%d",
    "data_dt_timezone": "UTC",
    "data_key_column": "ticker"
  }
}

Composite key example

When the symbol is constructed from multiple columns:

{
  "data_key_column": [
    { "type": "column", "value": "exchange" },
    { "type": "literal", "value": "_" },
    { "type": "column", "value": "ticker" }
  ]
}

This produces keys like NYSE_AAPL, NASDAQ_MSFT, etc.

Multiple datetime columns

When the date and time are in separate columns:

{
  "data_dt_column": ["trade_date", "trade_time"],
  "data_dt_format": ["%Y-%m-%d", "%H:%M:%S.%f"]
}

CloudQuant Data Liberator concatenates the columns with a space before parsing, so the effective format becomes "%Y-%m-%d %H:%M:%S.%f".

​Local file (CSV/TSV)

​Connection configuration

​Required fields

​Example connection

​Dataset configuration (data_args)

​Required fields

​Optional fields

​Complete example

​Connection

​Dataset

​Tab-separated files (TSV)

​Composite key example

​Multiple datetime columns

Local file (CSV/TSV)

Connection configuration

Required fields

Example connection

Dataset configuration (data_args)

Required fields

Optional fields

Complete example

Connection

Dataset

Tab-separated files (TSV)

Composite key example

Multiple datetime columns