Skip to main content

Amazon S3

S3 datasources allow CloudQuant Data Liberator to read CSV, TSV, and Parquet files directly from Amazon S3 buckets or S3-compatible object storage services (MinIO, Wasabi, Backblaze B2, etc.).

Connection Configuration

Required Fields

FieldTypeDescription
connection_typestringMust be "s3"
aws_access_key_idstringAWS access key ID
aws_secret_access_keystringAWS secret access key
bucketstringS3 bucket name
endpointstringS3 endpoint URL, e.g., "https://s3.amazonaws.com"

Optional Fields

FieldTypeDefaultDescription
prefixstring""Key prefix (virtual directory) within the bucket
request_stylestring"virtual"S3 request style: "path" or "virtual"
mount_pointstringLocal mount path for FUSE-based access
config_namestringInternal configuration identifier
object_keystringObject key pattern for file selection
For S3-compatible services (MinIO, Wasabi, etc.), set request_style to "path" and update the endpoint to point to your service. Virtual-hosted style is the default for AWS S3.

Example Connection

{
  "name": "s3-market-data",
  "connection_type": "s3",
  "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
  "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  "bucket": "my-market-data",
  "endpoint": "https://s3.amazonaws.com",
  "prefix": "daily/equities/",
  "request_style": "virtual"
}
Never commit AWS credentials to version control. Use environment variables or a secrets manager to inject credentials at deployment time.

CSV/TSV Dataset

The data_args are identical to Local File sources. The file_pattern is evaluated relative to the prefix configured on the connection.

Required data_args

FieldTypeDescription
file_patternstringGlob pattern relative to the prefix, e.g., "*.csv"
data_dt_columnstring or listColumn(s) containing the datetime value
data_dt_formatstring or liststrptime format or special values ("muts", "uts", "datetime")
data_key_columnstring or listSymbol/key column(s)

Optional data_args

FieldTypeDefaultDescription
sep_overridestring","Delimiter character
encodingstring"utf-8"File encoding
data_dt_timezonestring"UTC"Source data timezone
fname_dt_regexstringRegex to extract date from filename
fname_dt_formatstringstrptime format for filename date
fname_dt_timezonestringTimezone of filename date
fname_dt_nudgeint0Microsecond offset for filename date
fname_dt_approx_secondsintApproximate seconds per file
arrow_sortlist["symbol", "muts"]Sort order
arrow_timestampbooltrueGenerate human-readable timestamp column

Complete CSV Example

{
  "name": "s3-equity-trades",
  "connection": "s3-market-data",
  "data_args": {
    "file_pattern": "trades_*.csv",
    "sep_override": ",",
    "encoding": "utf-8",
    "data_dt_column": "timestamp",
    "data_dt_format": "%Y-%m-%d %H:%M:%S.%f",
    "data_dt_timezone": "UTC",
    "data_key_column": "symbol",
    "fname_dt_regex": "trades_(\\d{8})\\.csv",
    "fname_dt_format": "%Y%m%d",
    "fname_dt_timezone": "UTC",
    "fname_dt_approx_seconds": 86400,
    "arrow_sort": ["symbol", "muts"],
    "arrow_timestamp": true
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "timestamp", "type": "string", "group": "time", "description": "Trade timestamp" },
    { "name": "price", "type": "double", "group": "value", "description": "Trade price" },
    { "name": "size", "type": "int64", "group": "value", "description": "Trade size" },
    { "name": "exchange", "type": "string", "group": "value", "description": "Exchange code" }
  ]
}

Parquet Dataset

This provides passthrough access to Parquet data without intermediate caching, leveraging Arrow’s native Parquet reader.
Parquet files offer zero-copy reads and columnar pushdown. This is significantly more efficient than converting Parquet to CSV.

Complete Parquet Example

{
  "name": "s3-equity-bars-parquet",
  "connection": "s3-market-data",
  "data_args": {
    "file_pattern": "bars_*.parquet",
    "data_dt_column": "bar_time",
    "data_dt_format": "datetime",
    "data_dt_timezone": "America/New_York",
    "data_key_column": "symbol",
    "fname_dt_regex": "bars_(\\d{4}-\\d{2}-\\d{2})\\.parquet",
    "fname_dt_format": "%Y-%m-%d",
    "fname_dt_timezone": "America/New_York",
    "fname_dt_approx_seconds": 86400
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "bar_time", "type": "string", "group": "time", "description": "Bar timestamp" },
    { "name": "open", "type": "double", "group": "value", "description": "Open price" },
    { "name": "high", "type": "double", "group": "value", "description": "High price" },
    { "name": "low", "type": "double", "group": "value", "description": "Low price" },
    { "name": "close", "type": "double", "group": "value", "description": "Close price" },
    { "name": "volume", "type": "int64", "group": "value", "description": "Bar volume" }
  ]
}

S3-Compatible Storage

MinIO Example

{
  "name": "minio-connection",
  "connection_type": "s3",
  "aws_access_key_id": "minio-access-key",
  "aws_secret_access_key": "minio-secret-key",
  "bucket": "market-data",
  "endpoint": "https://minio.internal.example.com:9000",
  "request_style": "path"
}
Most S3-compatible services require request_style set to "path". Only AWS S3 defaults to virtual-hosted style.

IAM Permissions

The IAM user or role associated with the access key needs at minimum:
  • s3:GetObject on the bucket objects
  • s3:ListBucket on the bucket
See the S3 Bucket Setup guide for detailed IAM policy configuration.