Amazon S3

S3 datasources allow CloudQuant Data Liberator to read CSV, TSV, and Parquet files directly from Amazon S3 buckets or S3-compatible object storage services (MinIO, Wasabi, Backblaze B2, etc.).

See Supported Data Formats for every file extension Liberator can ingest on S3, including formats added in 2.1 and 2.2.

Connection configuration

Required fields

Field	Type	Description
`connection_type`	string	Must be `"s3"`
`aws_access_key_id`	string	AWS access key ID
`aws_secret_access_key`	string	AWS secret access key
`bucket`	string	S3 bucket name
`endpoint`	string	S3 endpoint URL, e.g., `"https://s3.amazonaws.com"`

Optional fields

Field	Type	Default	Description
`prefix`	string	`""`	Key prefix (virtual directory) within the bucket
`request_style`	string	`"virtual"`	S3 request style: `"path"` or `"virtual"`
`mount_point`	string		Local mount path for FUSE-based access
`config_name`	string		Internal configuration identifier
`object_key`	string		Object key pattern for file selection

For S3-compatible services (MinIO, Wasabi, etc.), set request_style to "path" and update the endpoint to point to your service. Virtual-hosted style is the default for AWS S3.

Example connection

{
  "name": "s3-market-data",
  "connection_type": "s3",
  "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
  "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  "bucket": "my-market-data",
  "endpoint": "https://s3.amazonaws.com",
  "prefix": "daily/equities/",
  "request_style": "virtual"
}

Never commit AWS credentials to version control. Use environment variables or a secrets manager to inject credentials at deployment time.

CSV/TSV dataset

The data_args are identical to Local File sources. The file_pattern is evaluated relative to the prefix configured on the connection.

Required data_args

Field	Type	Description
`file_pattern`	string	Glob pattern relative to the prefix, e.g., `"*.csv"`
`data_dt_column`	string or list	Column(s) containing the datetime value
`data_dt_format`	string or list	strptime format or special values (`"muts"`, `"uts"`, `"datetime"`)
`data_key_column`	string or list	Symbol/key column(s)

Optional data_args

Field	Type	Default	Description
`sep_override`	string	`","`	Delimiter character
`encoding`	string	`"utf-8"`	File encoding
`data_dt_timezone`	string	`"UTC"`	Source data timezone
`fname_dt_regex`	string		Regex to extract date from filename
`fname_dt_format`	string		strptime format for filename date
`fname_dt_timezone`	string		Timezone of filename date
`fname_dt_nudge`	int	`0`	Microsecond offset for filename date
`fname_dt_approx_seconds`	int		Approximate seconds per file
`arrow_sort`	list	`["symbol", "muts"]`	Sort order
`arrow_timestamp`	bool	`true`	Generate human-readable timestamp column

Complete CSV example

{
  "name": "s3-equity-trades",
  "connection": "s3-market-data",
  "data_args": {
    "file_pattern": "trades_*.csv",
    "sep_override": ",",
    "encoding": "utf-8",
    "data_dt_column": "timestamp",
    "data_dt_format": "%Y-%m-%d %H:%M:%S.%f",
    "data_dt_timezone": "UTC",
    "data_key_column": "symbol",
    "fname_dt_regex": "trades_(\\d{8})\\.csv",
    "fname_dt_format": "%Y%m%d",
    "fname_dt_timezone": "UTC",
    "fname_dt_approx_seconds": 86400,
    "arrow_sort": ["symbol", "muts"],
    "arrow_timestamp": true
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "timestamp", "type": "string", "group": "time", "description": "Trade timestamp" },
    { "name": "price", "type": "double", "group": "value", "description": "Trade price" },
    { "name": "size", "type": "int64", "group": "value", "description": "Trade size" },
    { "name": "exchange", "type": "string", "group": "value", "description": "Exchange code" }
  ]
}

Parquet dataset

This provides passthrough access to Parquet data without intermediate caching, leveraging Arrow’s native Parquet reader.

Parquet files offer zero-copy reads and columnar pushdown. This is significantly more efficient than converting Parquet to CSV.

Complete Parquet example

{
  "name": "s3-equity-bars-parquet",
  "connection": "s3-market-data",
  "data_args": {
    "file_pattern": "bars_*.parquet",
    "data_dt_column": "bar_time",
    "data_dt_format": "datetime",
    "data_dt_timezone": "America/New_York",
    "data_key_column": "symbol",
    "fname_dt_regex": "bars_(\\d{4}-\\d{2}-\\d{2})\\.parquet",
    "fname_dt_format": "%Y-%m-%d",
    "fname_dt_timezone": "America/New_York",
    "fname_dt_approx_seconds": 86400
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "bar_time", "type": "string", "group": "time", "description": "Bar timestamp" },
    { "name": "open", "type": "double", "group": "value", "description": "Open price" },
    { "name": "high", "type": "double", "group": "value", "description": "High price" },
    { "name": "low", "type": "double", "group": "value", "description": "Low price" },
    { "name": "close", "type": "double", "group": "value", "description": "Close price" },
    { "name": "volume", "type": "int64", "group": "value", "description": "Bar volume" }
  ]
}

S3-compatible storage

MinIO example

{
  "name": "minio-connection",
  "connection_type": "s3",
  "aws_access_key_id": "minio-access-key",
  "aws_secret_access_key": "minio-secret-key",
  "bucket": "market-data",
  "endpoint": "https://minio.internal.example.com:9000",
  "request_style": "path"
}

Most S3-compatible services require request_style set to "path". Only AWS S3 defaults to virtual-hosted style.

IAM permissions

The IAM user or role associated with the access key needs at minimum:

s3:GetObject on the bucket objects
s3:ListBucket on the bucket

See the S3 Bucket Setup guide for detailed IAM policy configuration.

​Amazon S3

​Connection configuration

​Required fields

​Optional fields

​Example connection

​CSV/TSV dataset

​Required data_args

​Optional data_args

​Complete CSV example

​Parquet dataset

​Complete Parquet example

​S3-compatible storage

​MinIO example

​IAM permissions

Amazon S3

Connection configuration

Required fields

Optional fields

Example connection

CSV/TSV dataset

Required data_args

Optional data_args

Complete CSV example

Parquet dataset

Complete Parquet example

S3-compatible storage

MinIO example

IAM permissions