SFTP
SFTP (SSH File Transfer Protocol) datasources allow CloudQuant Data Liberator to read CSV, TSV, and other delimited files from remote servers over an encrypted SSH connection. CloudQuant Data Liberator mounts the remote directory via SSHFS/FUSE.
Connection Configuration
Required Fields
| Field | Type | Description |
|---|
connection_type | string | Must be "sftp" |
host | string | SFTP server hostname or IP address |
user | string | Username for authentication |
You must provide either password or key for authentication. If both are specified, key-based authentication takes precedence.
Optional Fields
| Field | Type | Default | Description |
|---|
port | int | 22 | SSH port number |
password | string | | Password for password-based authentication |
key | string | | SSH private key content (PEM format) for key-based authentication |
prefix | string | "" | Remote directory path to use as root |
mount_point | string | | Local mount path for FUSE-based access |
config_name | string | | Internal configuration identifier |
Example Connection (Password Authentication)
{
"name": "sftp-vendor-data",
"connection_type": "sftp",
"host": "sftp.vendor.example.com",
"port": 22,
"user": "datauser",
"password": "s3cur3P@ssw0rd",
"prefix": "/data/daily-feeds/"
}
Example Connection (Key Authentication)
{
"name": "sftp-internal-data",
"connection_type": "sftp",
"host": "data-server.internal.net",
"port": 2222,
"user": "liberator-svc",
"key": "-----BEGIN OPENSSH PRIVATE KEY-----\nb3BlbnNza...\n-----END OPENSSH PRIVATE KEY-----",
"prefix": "/exports/market-data/"
}
Avoid embedding private keys or passwords directly in configuration files. Use environment variables or a secrets manager to inject credentials at deployment time.
Dataset Configuration (data_args)
The data_args fields are identical to all file-based sources. See Local File for the full reference. The file_pattern is evaluated relative to the prefix configured on the connection.
Required data_args
| Field | Type | Description |
|---|
file_pattern | string | Glob pattern relative to the prefix, e.g., "*.csv" |
data_dt_column | string or list | Column(s) containing the datetime value |
data_dt_format | string or list | strptime format or special values ("muts", "uts", "datetime") |
data_key_column | string or list | Symbol/key column(s) |
Optional data_args
| Field | Type | Default | Description |
|---|
sep_override | string | "," | Delimiter character |
encoding | string | "utf-8" | File encoding |
data_dt_timezone | string | "UTC" | Source data timezone |
fname_dt_regex | string | | Regex to extract date from filename |
fname_dt_format | string | | strptime format for filename date |
fname_dt_timezone | string | | Timezone of filename date |
fname_dt_nudge | int | 0 | Microsecond offset for filename date |
fname_dt_approx_seconds | int | | Approximate seconds per file |
arrow_sort | list | ["symbol", "muts"] | Sort order |
arrow_timestamp | bool | true | Generate human-readable timestamp column |
Complete Example
Connection
{
"name": "sftp-trades-feed",
"connection_type": "sftp",
"host": "sftp.dataprovider.com",
"port": 22,
"user": "cq-ingest",
"password": "vendorPassword123",
"prefix": "/feeds/trades/"
}
Dataset
{
"name": "vendor-trades",
"connection": "sftp-trades-feed",
"data_args": {
"file_pattern": "trades_*.csv.gz",
"sep_override": ",",
"encoding": "utf-8",
"data_dt_column": ["date", "time"],
"data_dt_format": ["%Y%m%d", "%H:%M:%S.%f"],
"data_dt_timezone": "America/New_York",
"data_key_column": "symbol",
"fname_dt_regex": "trades_(\\d{8})\\.csv\\.gz",
"fname_dt_format": "%Y%m%d",
"fname_dt_timezone": "America/New_York",
"fname_dt_approx_seconds": 86400,
"arrow_sort": ["symbol", "muts"],
"arrow_timestamp": true
},
"schema": [
{ "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
{ "name": "date", "type": "string", "group": "time", "description": "Trade date" },
{ "name": "time", "type": "string", "group": "time", "description": "Trade time" },
{ "name": "price", "type": "double", "group": "value", "description": "Trade price" },
{ "name": "size", "type": "int64", "group": "value", "description": "Trade size" },
{ "name": "condition", "type": "string", "group": "value", "description": "Sale condition code" }
]
}
CloudQuant Data Liberator supports reading gzip-compressed files (.csv.gz) transparently. Use compressed files on SFTP connections to reduce transfer time over slow or high-latency links.
Network Requirements
Ensure the following network connectivity from the CloudQuant Data Liberator host:
| Requirement | Detail |
|---|
| Outbound port | TCP port 22 (or custom port) to the SFTP server |
| DNS resolution | The hostname must resolve from the CloudQuant Data Liberator host |
| Firewall rules | Whitelist the CloudQuant Data Liberator host IP on the SFTP server |
| SSH host key | The server’s host key must be trusted (added to known_hosts) |