Amazon S3
S3 datasources allow CloudQuant Data Liberator to read CSV, TSV, and Parquet files directly from Amazon S3 buckets or S3-compatible object storage services (MinIO, Wasabi, Backblaze B2, etc.).
See Supported Data Formats for every file extension Liberator can ingest on S3, including formats added in 2.1 and 2.2.
Connection configuration
Required fields
| Field | Type | Description |
|---|
connection_type | string | Must be "s3" |
aws_access_key_id | string | AWS access key ID |
aws_secret_access_key | string | AWS secret access key |
bucket | string | S3 bucket name |
endpoint | string | S3 endpoint URL, e.g., "https://s3.amazonaws.com" |
Optional fields
| Field | Type | Default | Description |
|---|
prefix | string | "" | Key prefix (virtual directory) within the bucket |
request_style | string | "virtual" | S3 request style: "path" or "virtual" |
mount_point | string | | Local mount path for FUSE-based access |
config_name | string | | Internal configuration identifier |
object_key | string | | Object key pattern for file selection |
For S3-compatible services (MinIO, Wasabi, etc.), set request_style to "path" and update the endpoint to point to your service. Virtual-hosted style is the default for AWS S3.
Example connection
{
"name": "s3-market-data",
"connection_type": "s3",
"aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
"aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"bucket": "my-market-data",
"endpoint": "https://s3.amazonaws.com",
"prefix": "daily/equities/",
"request_style": "virtual"
}
Never commit AWS credentials to version control. Use environment variables or a secrets manager to inject credentials at deployment time.
CSV/TSV dataset
The data_args are identical to Local File sources. The file_pattern is evaluated relative to the prefix configured on the connection.
Required data_args
| Field | Type | Description |
|---|
file_pattern | string | Glob pattern relative to the prefix, e.g., "*.csv" |
data_dt_column | string or list | Column(s) containing the datetime value |
data_dt_format | string or list | strptime format or special values ("muts", "uts", "datetime") |
data_key_column | string or list | Symbol/key column(s) |
Optional data_args
| Field | Type | Default | Description |
|---|
sep_override | string | "," | Delimiter character |
encoding | string | "utf-8" | File encoding |
data_dt_timezone | string | "UTC" | Source data timezone |
fname_dt_regex | string | | Regex to extract date from filename |
fname_dt_format | string | | strptime format for filename date |
fname_dt_timezone | string | | Timezone of filename date |
fname_dt_nudge | int | 0 | Microsecond offset for filename date |
fname_dt_approx_seconds | int | | Approximate seconds per file |
arrow_sort | list | ["symbol", "muts"] | Sort order |
arrow_timestamp | bool | true | Generate human-readable timestamp column |
Complete CSV example
{
"name": "s3-equity-trades",
"connection": "s3-market-data",
"data_args": {
"file_pattern": "trades_*.csv",
"sep_override": ",",
"encoding": "utf-8",
"data_dt_column": "timestamp",
"data_dt_format": "%Y-%m-%d %H:%M:%S.%f",
"data_dt_timezone": "UTC",
"data_key_column": "symbol",
"fname_dt_regex": "trades_(\\d{8})\\.csv",
"fname_dt_format": "%Y%m%d",
"fname_dt_timezone": "UTC",
"fname_dt_approx_seconds": 86400,
"arrow_sort": ["symbol", "muts"],
"arrow_timestamp": true
},
"schema": [
{ "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
{ "name": "timestamp", "type": "string", "group": "time", "description": "Trade timestamp" },
{ "name": "price", "type": "double", "group": "value", "description": "Trade price" },
{ "name": "size", "type": "int64", "group": "value", "description": "Trade size" },
{ "name": "exchange", "type": "string", "group": "value", "description": "Exchange code" }
]
}
Parquet dataset
This provides passthrough access to Parquet data without intermediate caching, leveraging Arrow’s native Parquet reader.
Parquet files offer zero-copy reads and columnar pushdown. This is significantly more efficient than converting Parquet to CSV.
Complete Parquet example
{
"name": "s3-equity-bars-parquet",
"connection": "s3-market-data",
"data_args": {
"file_pattern": "bars_*.parquet",
"data_dt_column": "bar_time",
"data_dt_format": "datetime",
"data_dt_timezone": "America/New_York",
"data_key_column": "symbol",
"fname_dt_regex": "bars_(\\d{4}-\\d{2}-\\d{2})\\.parquet",
"fname_dt_format": "%Y-%m-%d",
"fname_dt_timezone": "America/New_York",
"fname_dt_approx_seconds": 86400
},
"schema": [
{ "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
{ "name": "bar_time", "type": "string", "group": "time", "description": "Bar timestamp" },
{ "name": "open", "type": "double", "group": "value", "description": "Open price" },
{ "name": "high", "type": "double", "group": "value", "description": "High price" },
{ "name": "low", "type": "double", "group": "value", "description": "Low price" },
{ "name": "close", "type": "double", "group": "value", "description": "Close price" },
{ "name": "volume", "type": "int64", "group": "value", "description": "Bar volume" }
]
}
S3-compatible storage
MinIO example
{
"name": "minio-connection",
"connection_type": "s3",
"aws_access_key_id": "minio-access-key",
"aws_secret_access_key": "minio-secret-key",
"bucket": "market-data",
"endpoint": "https://minio.internal.example.com:9000",
"request_style": "path"
}
Most S3-compatible services require request_style set to "path". Only AWS S3 defaults to virtual-hosted style.
IAM permissions
The IAM user or role associated with the access key needs at minimum:
s3:GetObject on the bucket objects
s3:ListBucket on the bucket
See the S3 Bucket Setup guide for detailed IAM policy configuration.