> ## Documentation Index
> Fetch the complete documentation index at: https://knowledge.cloudquant.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Amazon S3

> Configure S3 or S3-compatible object storage datasources

# Amazon S3

S3 datasources allow CloudQuant Data Liberator to read CSV, TSV, and Parquet files directly from Amazon S3 buckets or S3-compatible object storage services (MinIO, Wasabi, Backblaze B2, etc.).

<Note>
  See [Supported Data Formats](/datasource-config/supported-formats) for every file extension Liberator can ingest on S3, including formats added in 2.1 and 2.2.
</Note>

## Connection configuration

### Required fields

| Field                   | Type   | Description                                         |
| ----------------------- | ------ | --------------------------------------------------- |
| `connection_type`       | string | Must be `"s3"`                                      |
| `aws_access_key_id`     | string | AWS access key ID                                   |
| `aws_secret_access_key` | string | AWS secret access key                               |
| `bucket`                | string | S3 bucket name                                      |
| `endpoint`              | string | S3 endpoint URL, e.g., `"https://s3.amazonaws.com"` |

### Optional fields

| Field           | Type   | Default     | Description                                      |
| --------------- | ------ | ----------- | ------------------------------------------------ |
| `prefix`        | string | `""`        | Key prefix (virtual directory) within the bucket |
| `request_style` | string | `"virtual"` | S3 request style: `"path"` or `"virtual"`        |
| `mount_point`   | string |             | Local mount path for FUSE-based access           |
| `config_name`   | string |             | Internal configuration identifier                |
| `object_key`    | string |             | Object key pattern for file selection            |

<Note>
  For S3-compatible services (MinIO, Wasabi, etc.), set `request_style` to `"path"` and update the `endpoint` to point to your service. Virtual-hosted style is the default for AWS S3.
</Note>

### Example connection

```json theme={null}
{
  "name": "s3-market-data",
  "connection_type": "s3",
  "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
  "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  "bucket": "my-market-data",
  "endpoint": "https://s3.amazonaws.com",
  "prefix": "daily/equities/",
  "request_style": "virtual"
}
```

<Warning>
  Never commit AWS credentials to version control. Use environment variables or a secrets manager to inject credentials at deployment time.
</Warning>

## CSV/TSV dataset

The `data_args` are identical to [Local File](/datasource-config/local-file) sources. The `file_pattern` is evaluated relative to the `prefix` configured on the connection.

### Required data\_args

| Field             | Type           | Description                                                         |
| ----------------- | -------------- | ------------------------------------------------------------------- |
| `file_pattern`    | string         | Glob pattern relative to the prefix, e.g., `"*.csv"`                |
| `data_dt_column`  | string or list | Column(s) containing the datetime value                             |
| `data_dt_format`  | string or list | strptime format or special values (`"muts"`, `"uts"`, `"datetime"`) |
| `data_key_column` | string or list | Symbol/key column(s)                                                |

### Optional data\_args

| Field                     | Type   | Default              | Description                              |
| ------------------------- | ------ | -------------------- | ---------------------------------------- |
| `sep_override`            | string | `","`                | Delimiter character                      |
| `encoding`                | string | `"utf-8"`            | File encoding                            |
| `data_dt_timezone`        | string | `"UTC"`              | Source data timezone                     |
| `fname_dt_regex`          | string |                      | Regex to extract date from filename      |
| `fname_dt_format`         | string |                      | strptime format for filename date        |
| `fname_dt_timezone`       | string |                      | Timezone of filename date                |
| `fname_dt_nudge`          | int    | `0`                  | Microsecond offset for filename date     |
| `fname_dt_approx_seconds` | int    |                      | Approximate seconds per file             |
| `arrow_sort`              | list   | `["symbol", "muts"]` | Sort order                               |
| `arrow_timestamp`         | bool   | `true`               | Generate human-readable timestamp column |

### Complete CSV example

```json theme={null}
{
  "name": "s3-equity-trades",
  "connection": "s3-market-data",
  "data_args": {
    "file_pattern": "trades_*.csv",
    "sep_override": ",",
    "encoding": "utf-8",
    "data_dt_column": "timestamp",
    "data_dt_format": "%Y-%m-%d %H:%M:%S.%f",
    "data_dt_timezone": "UTC",
    "data_key_column": "symbol",
    "fname_dt_regex": "trades_(\\d{8})\\.csv",
    "fname_dt_format": "%Y%m%d",
    "fname_dt_timezone": "UTC",
    "fname_dt_approx_seconds": 86400,
    "arrow_sort": ["symbol", "muts"],
    "arrow_timestamp": true
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "timestamp", "type": "string", "group": "time", "description": "Trade timestamp" },
    { "name": "price", "type": "double", "group": "value", "description": "Trade price" },
    { "name": "size", "type": "int64", "group": "value", "description": "Trade size" },
    { "name": "exchange", "type": "string", "group": "value", "description": "Exchange code" }
  ]
}
```

## Parquet dataset

This provides passthrough access to Parquet data without intermediate caching, leveraging Arrow's native Parquet reader.

<Tip>
  Parquet files offer zero-copy reads and columnar pushdown. This is significantly more efficient than converting Parquet to CSV.
</Tip>

### Complete Parquet example

```json theme={null}
{
  "name": "s3-equity-bars-parquet",
  "connection": "s3-market-data",
  "data_args": {
    "file_pattern": "bars_*.parquet",
    "data_dt_column": "bar_time",
    "data_dt_format": "datetime",
    "data_dt_timezone": "America/New_York",
    "data_key_column": "symbol",
    "fname_dt_regex": "bars_(\\d{4}-\\d{2}-\\d{2})\\.parquet",
    "fname_dt_format": "%Y-%m-%d",
    "fname_dt_timezone": "America/New_York",
    "fname_dt_approx_seconds": 86400
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "bar_time", "type": "string", "group": "time", "description": "Bar timestamp" },
    { "name": "open", "type": "double", "group": "value", "description": "Open price" },
    { "name": "high", "type": "double", "group": "value", "description": "High price" },
    { "name": "low", "type": "double", "group": "value", "description": "Low price" },
    { "name": "close", "type": "double", "group": "value", "description": "Close price" },
    { "name": "volume", "type": "int64", "group": "value", "description": "Bar volume" }
  ]
}
```

## S3-compatible storage

### MinIO example

```json theme={null}
{
  "name": "minio-connection",
  "connection_type": "s3",
  "aws_access_key_id": "minio-access-key",
  "aws_secret_access_key": "minio-secret-key",
  "bucket": "market-data",
  "endpoint": "https://minio.internal.example.com:9000",
  "request_style": "path"
}
```

<Note>
  Most S3-compatible services require `request_style` set to `"path"`. Only AWS S3 defaults to virtual-hosted style.
</Note>

## IAM permissions

The IAM user or role associated with the access key needs at minimum:

* `s3:GetObject` on the bucket objects
* `s3:ListBucket` on the bucket

See the [S3 Bucket Setup](/integrations/s3-bucket-setup) guide for detailed IAM policy configuration.
