> ## Documentation Index
> Fetch the complete documentation index at: https://knowledge.cloudquant.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Local File (CSV/TSV)

> Configure local or mounted file datasources for CloudQuant Data Liberator

# Local file (CSV/TSV)

Local file datasources read CSV, TSV, or other delimited flat files from a directory on the CloudQuant Data Liberator server or a mounted filesystem. This is the simplest file-based connection type and serves as the foundation for understanding all other file-based sources.

<Note>
  See [Supported Data Formats](/datasource-config/supported-formats) for every file extension Liberator can ingest on this connection, including Parquet, Arrow, Excel, XML, HDF5, PSV (2.1+), and PCAP (2.2+).
</Note>

## Connection configuration

### Required fields

| Field             | Type   | Description                                          |
| ----------------- | ------ | ---------------------------------------------------- |
| `connection_type` | string | Must be `"file"`                                     |
| `behavior`        | string | Must be `"file"`                                     |
| `location`        | string | Absolute path to the directory containing data files |

<Note>
  The `location` field should point to a **directory**, not an individual file. CloudQuant Data Liberator will scan the directory for files matching the `file_pattern` in `data_args`.
</Note>

### Example connection

```json theme={null}
{
  "name": "local-trades-connection",
  "connection_type": "file",
  "behavior": "file",
  "location": "/data/trades"
}
```

## Dataset configuration (data\_args)

All file-based datasources share the same `data_args` fields. These control how CloudQuant Data Liberator finds, parses, and interprets your files.

### Required fields

| Field             | Type           | Description                                                                                    |
| ----------------- | -------------- | ---------------------------------------------------------------------------------------------- |
| `file_pattern`    | string         | Glob pattern to match files, e.g., `"*.csv"`, `"prefix_*.tsv"`                                 |
| `data_dt_column`  | string or list | Column(s) containing the datetime value                                                        |
| `data_dt_format`  | string or list | strptime format string, or special values: `"muts"`, `"uts"`, `"nuts"`, `"datetime"`, `"date"` |
| `data_key_column` | string or list | Column(s) used as the symbol/key for query filtering                                           |

### Optional fields

| Field                     | Type   | Default              | Description                                                                        |
| ------------------------- | ------ | -------------------- | ---------------------------------------------------------------------------------- |
| `sep_override`            | string | `","`                | Delimiter character: `","` (comma), `"\t"` (tab), `"\|"` (pipe), `";"` (semicolon) |
| `encoding`                | string | `"utf-8"`            | File encoding (e.g., `"utf-8"`, `"latin-1"`, `"ascii"`)                            |
| `data_dt_timezone`        | string | `"UTC"`              | Timezone of source data, e.g., `"UTC"`, `"America/New_York"`                       |
| `fname_dt_regex`          | string |                      | Regex to extract a date from the filename                                          |
| `fname_dt_format`         | string |                      | strptime format for the date extracted by `fname_dt_regex`                         |
| `fname_dt_timezone`       | string |                      | Timezone of the filename-derived date                                              |
| `fname_dt_nudge`          | int    | `0`                  | Microsecond offset applied to filename-derived dates                               |
| `fname_dt_approx_seconds` | int    |                      | Approximate number of seconds of data per file (used for query optimization)       |
| `arrow_sort`              | list   | `["symbol", "muts"]` | Sort order for the resulting Arrow table                                           |
| `arrow_timestamp`         | bool   | `true`               | Whether to generate the human-readable `timestamp` column                          |

<Tip>
  Set `fname_dt_approx_seconds` to `86400` for daily files. This helps CloudQuant Data Liberator skip files outside the query's time range, significantly improving performance for large directories.
</Tip>

## Complete example

Below is a full configuration showing both the connection and a dataset for daily trade CSV files.

### Connection

```json theme={null}
{
  "name": "local-daily-trades",
  "connection_type": "file",
  "behavior": "file",
  "location": "/data/daily-trades"
}
```

### Dataset

```json theme={null}
{
  "name": "us-equity-trades",
  "connection": "local-daily-trades",
  "data_args": {
    "file_pattern": "trades_*.csv",
    "sep_override": ",",
    "encoding": "utf-8",
    "data_dt_column": "trade_time",
    "data_dt_format": "%Y-%m-%d %H:%M:%S",
    "data_dt_timezone": "America/New_York",
    "data_key_column": "symbol",
    "fname_dt_regex": "trades_(\\d{4}-\\d{2}-\\d{2})\\.csv",
    "fname_dt_format": "%Y-%m-%d",
    "fname_dt_timezone": "America/New_York",
    "fname_dt_approx_seconds": 86400,
    "arrow_sort": ["symbol", "muts"],
    "arrow_timestamp": true
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "trade_time", "type": "string", "group": "time", "description": "Trade timestamp" },
    { "name": "price", "type": "double", "group": "value", "description": "Trade price" },
    { "name": "volume", "type": "int64", "group": "value", "description": "Trade volume" }
  ]
}
```

<Warning>
  Ensure the CloudQuant Data Liberator process has read permissions on the `location` directory and all files within it. Permission errors will cause silent failures during query execution.
</Warning>

## Tab-separated files (TSV)

For TSV files, set `sep_override` to `"\t"`:

```json theme={null}
{
  "data_args": {
    "file_pattern": "*.tsv",
    "sep_override": "\t",
    "data_dt_column": "date",
    "data_dt_format": "%Y%m%d",
    "data_dt_timezone": "UTC",
    "data_key_column": "ticker"
  }
}
```

## Composite key example

When the symbol is constructed from multiple columns:

```json theme={null}
{
  "data_key_column": [
    { "type": "column", "value": "exchange" },
    { "type": "literal", "value": "_" },
    { "type": "column", "value": "ticker" }
  ]
}
```

This produces keys like `NYSE_AAPL`, `NASDAQ_MSFT`, etc.

## Multiple datetime columns

When the date and time are in separate columns:

```json theme={null}
{
  "data_dt_column": ["trade_date", "trade_time"],
  "data_dt_format": ["%Y-%m-%d", "%H:%M:%S.%f"]
}
```

CloudQuant Data Liberator concatenates the columns with a space before parsing, so the effective format becomes `"%Y-%m-%d %H:%M:%S.%f"`.
