Amazon S3
S3 datasources allow CloudQuant Data Liberator to read CSV, TSV, and Parquet files directly from Amazon S3 buckets or S3-compatible object storage services (MinIO, Wasabi, Backblaze B2, etc.).
Connection Configuration
Required Fields
| Field | Type | Description |
|---|
connection_type | string | Must be "s3" |
aws_access_key_id | string | AWS access key ID |
aws_secret_access_key | string | AWS secret access key |
bucket | string | S3 bucket name |
endpoint | string | S3 endpoint URL, e.g., "https://s3.amazonaws.com" |
Optional Fields
| Field | Type | Default | Description |
|---|
prefix | string | "" | Key prefix (virtual directory) within the bucket |
request_style | string | "virtual" | S3 request style: "path" or "virtual" |
mount_point | string | | Local mount path for FUSE-based access |
config_name | string | | Internal configuration identifier |
object_key | string | | Object key pattern for file selection |
For S3-compatible services (MinIO, Wasabi, etc.), set request_style to "path" and update the endpoint to point to your service. Virtual-hosted style is the default for AWS S3.
Example Connection
{
"name": "s3-market-data",
"connection_type": "s3",
"aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
"aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"bucket": "my-market-data",
"endpoint": "https://s3.amazonaws.com",
"prefix": "daily/equities/",
"request_style": "virtual"
}
Never commit AWS credentials to version control. Use environment variables or a secrets manager to inject credentials at deployment time.
CSV/TSV Dataset
The data_args are identical to Local File sources. The file_pattern is evaluated relative to the prefix configured on the connection.
Required data_args
| Field | Type | Description |
|---|
file_pattern | string | Glob pattern relative to the prefix, e.g., "*.csv" |
data_dt_column | string or list | Column(s) containing the datetime value |
data_dt_format | string or list | strptime format or special values ("muts", "uts", "datetime") |
data_key_column | string or list | Symbol/key column(s) |
Optional data_args
| Field | Type | Default | Description |
|---|
sep_override | string | "," | Delimiter character |
encoding | string | "utf-8" | File encoding |
data_dt_timezone | string | "UTC" | Source data timezone |
fname_dt_regex | string | | Regex to extract date from filename |
fname_dt_format | string | | strptime format for filename date |
fname_dt_timezone | string | | Timezone of filename date |
fname_dt_nudge | int | 0 | Microsecond offset for filename date |
fname_dt_approx_seconds | int | | Approximate seconds per file |
arrow_sort | list | ["symbol", "muts"] | Sort order |
arrow_timestamp | bool | true | Generate human-readable timestamp column |
Complete CSV Example
{
"name": "s3-equity-trades",
"connection": "s3-market-data",
"data_args": {
"file_pattern": "trades_*.csv",
"sep_override": ",",
"encoding": "utf-8",
"data_dt_column": "timestamp",
"data_dt_format": "%Y-%m-%d %H:%M:%S.%f",
"data_dt_timezone": "UTC",
"data_key_column": "symbol",
"fname_dt_regex": "trades_(\\d{8})\\.csv",
"fname_dt_format": "%Y%m%d",
"fname_dt_timezone": "UTC",
"fname_dt_approx_seconds": 86400,
"arrow_sort": ["symbol", "muts"],
"arrow_timestamp": true
},
"schema": [
{ "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
{ "name": "timestamp", "type": "string", "group": "time", "description": "Trade timestamp" },
{ "name": "price", "type": "double", "group": "value", "description": "Trade price" },
{ "name": "size", "type": "int64", "group": "value", "description": "Trade size" },
{ "name": "exchange", "type": "string", "group": "value", "description": "Exchange code" }
]
}
Parquet Dataset
This provides passthrough access to Parquet data without intermediate caching, leveraging Arrow’s native Parquet reader.
Parquet files offer zero-copy reads and columnar pushdown. This is significantly more efficient than converting Parquet to CSV.
Complete Parquet Example
{
"name": "s3-equity-bars-parquet",
"connection": "s3-market-data",
"data_args": {
"file_pattern": "bars_*.parquet",
"data_dt_column": "bar_time",
"data_dt_format": "datetime",
"data_dt_timezone": "America/New_York",
"data_key_column": "symbol",
"fname_dt_regex": "bars_(\\d{4}-\\d{2}-\\d{2})\\.parquet",
"fname_dt_format": "%Y-%m-%d",
"fname_dt_timezone": "America/New_York",
"fname_dt_approx_seconds": 86400
},
"schema": [
{ "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
{ "name": "bar_time", "type": "string", "group": "time", "description": "Bar timestamp" },
{ "name": "open", "type": "double", "group": "value", "description": "Open price" },
{ "name": "high", "type": "double", "group": "value", "description": "High price" },
{ "name": "low", "type": "double", "group": "value", "description": "Low price" },
{ "name": "close", "type": "double", "group": "value", "description": "Close price" },
{ "name": "volume", "type": "int64", "group": "value", "description": "Bar volume" }
]
}
S3-Compatible Storage
MinIO Example
{
"name": "minio-connection",
"connection_type": "s3",
"aws_access_key_id": "minio-access-key",
"aws_secret_access_key": "minio-secret-key",
"bucket": "market-data",
"endpoint": "https://minio.internal.example.com:9000",
"request_style": "path"
}
Most S3-compatible services require request_style set to "path". Only AWS S3 defaults to virtual-hosted style.
IAM Permissions
The IAM user or role associated with the access key needs at minimum:
s3:GetObject on the bucket objects
s3:ListBucket on the bucket
See the S3 Bucket Setup guide for detailed IAM policy configuration.