Skip to main content

Documentation Index

Fetch the complete documentation index at: https://knowledge.cloudquant.com/llms.txt

Use this file to discover all available pages before exploring further.

Amazon S3

S3 datasources allow CloudQuant Data Liberator to read CSV, TSV, and Parquet files directly from Amazon S3 buckets or S3-compatible object storage services (MinIO, Wasabi, Backblaze B2, etc.).

Connection Configuration

Required Fields

FieldTypeDescription
connection_typestringMust be "s3"
aws_access_key_idstringAWS access key ID
aws_secret_access_keystringAWS secret access key
bucketstringS3 bucket name
endpointstringS3 endpoint URL, e.g., "https://s3.amazonaws.com"

Optional Fields

FieldTypeDefaultDescription
prefixstring""Key prefix (virtual directory) within the bucket
request_stylestring"virtual"S3 request style: "path" or "virtual"
mount_pointstringLocal mount path for FUSE-based access
config_namestringInternal configuration identifier
object_keystringObject key pattern for file selection
For S3-compatible services (MinIO, Wasabi, etc.), set request_style to "path" and update the endpoint to point to your service. Virtual-hosted style is the default for AWS S3.

Example Connection

{
  "name": "s3-market-data",
  "connection_type": "s3",
  "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
  "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  "bucket": "my-market-data",
  "endpoint": "https://s3.amazonaws.com",
  "prefix": "daily/equities/",
  "request_style": "virtual"
}
Never commit AWS credentials to version control. Use environment variables or a secrets manager to inject credentials at deployment time.

CSV/TSV Dataset

The data_args are identical to Local File sources. The file_pattern is evaluated relative to the prefix configured on the connection.

Required data_args

FieldTypeDescription
file_patternstringGlob pattern relative to the prefix, e.g., "*.csv"
data_dt_columnstring or listColumn(s) containing the datetime value
data_dt_formatstring or liststrptime format or special values ("muts", "uts", "datetime")
data_key_columnstring or listSymbol/key column(s)

Optional data_args

FieldTypeDefaultDescription
sep_overridestring","Delimiter character
encodingstring"utf-8"File encoding
data_dt_timezonestring"UTC"Source data timezone
fname_dt_regexstringRegex to extract date from filename
fname_dt_formatstringstrptime format for filename date
fname_dt_timezonestringTimezone of filename date
fname_dt_nudgeint0Microsecond offset for filename date
fname_dt_approx_secondsintApproximate seconds per file
arrow_sortlist["symbol", "muts"]Sort order
arrow_timestampbooltrueGenerate human-readable timestamp column

Complete CSV Example

{
  "name": "s3-equity-trades",
  "connection": "s3-market-data",
  "data_args": {
    "file_pattern": "trades_*.csv",
    "sep_override": ",",
    "encoding": "utf-8",
    "data_dt_column": "timestamp",
    "data_dt_format": "%Y-%m-%d %H:%M:%S.%f",
    "data_dt_timezone": "UTC",
    "data_key_column": "symbol",
    "fname_dt_regex": "trades_(\\d{8})\\.csv",
    "fname_dt_format": "%Y%m%d",
    "fname_dt_timezone": "UTC",
    "fname_dt_approx_seconds": 86400,
    "arrow_sort": ["symbol", "muts"],
    "arrow_timestamp": true
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "timestamp", "type": "string", "group": "time", "description": "Trade timestamp" },
    { "name": "price", "type": "double", "group": "value", "description": "Trade price" },
    { "name": "size", "type": "int64", "group": "value", "description": "Trade size" },
    { "name": "exchange", "type": "string", "group": "value", "description": "Exchange code" }
  ]
}

Parquet Dataset

This provides passthrough access to Parquet data without intermediate caching, leveraging Arrow’s native Parquet reader.
Parquet files offer zero-copy reads and columnar pushdown. This is significantly more efficient than converting Parquet to CSV.

Complete Parquet Example

{
  "name": "s3-equity-bars-parquet",
  "connection": "s3-market-data",
  "data_args": {
    "file_pattern": "bars_*.parquet",
    "data_dt_column": "bar_time",
    "data_dt_format": "datetime",
    "data_dt_timezone": "America/New_York",
    "data_key_column": "symbol",
    "fname_dt_regex": "bars_(\\d{4}-\\d{2}-\\d{2})\\.parquet",
    "fname_dt_format": "%Y-%m-%d",
    "fname_dt_timezone": "America/New_York",
    "fname_dt_approx_seconds": 86400
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "bar_time", "type": "string", "group": "time", "description": "Bar timestamp" },
    { "name": "open", "type": "double", "group": "value", "description": "Open price" },
    { "name": "high", "type": "double", "group": "value", "description": "High price" },
    { "name": "low", "type": "double", "group": "value", "description": "Low price" },
    { "name": "close", "type": "double", "group": "value", "description": "Close price" },
    { "name": "volume", "type": "int64", "group": "value", "description": "Bar volume" }
  ]
}

S3-Compatible Storage

MinIO Example

{
  "name": "minio-connection",
  "connection_type": "s3",
  "aws_access_key_id": "minio-access-key",
  "aws_secret_access_key": "minio-secret-key",
  "bucket": "market-data",
  "endpoint": "https://minio.internal.example.com:9000",
  "request_style": "path"
}
Most S3-compatible services require request_style set to "path". Only AWS S3 defaults to virtual-hosted style.

IAM Permissions

The IAM user or role associated with the access key needs at minimum:
  • s3:GetObject on the bucket objects
  • s3:ListBucket on the bucket
See the S3 Bucket Setup guide for detailed IAM policy configuration.