Skip to main content

SFTP

SFTP (SSH File Transfer Protocol) datasources allow CloudQuant Data Liberator to read CSV, TSV, and other delimited files from remote servers over an encrypted SSH connection. CloudQuant Data Liberator mounts the remote directory via SSHFS/FUSE.

Connection Configuration

Required Fields

FieldTypeDescription
connection_typestringMust be "sftp"
hoststringSFTP server hostname or IP address
userstringUsername for authentication
You must provide either password or key for authentication. If both are specified, key-based authentication takes precedence.

Optional Fields

FieldTypeDefaultDescription
portint22SSH port number
passwordstringPassword for password-based authentication
keystringSSH private key content (PEM format) for key-based authentication
prefixstring""Remote directory path to use as root
mount_pointstringLocal mount path for FUSE-based access
config_namestringInternal configuration identifier

Example Connection (Password Authentication)

{
  "name": "sftp-vendor-data",
  "connection_type": "sftp",
  "host": "sftp.vendor.example.com",
  "port": 22,
  "user": "datauser",
  "password": "s3cur3P@ssw0rd",
  "prefix": "/data/daily-feeds/"
}

Example Connection (Key Authentication)

{
  "name": "sftp-internal-data",
  "connection_type": "sftp",
  "host": "data-server.internal.net",
  "port": 2222,
  "user": "liberator-svc",
  "key": "-----BEGIN OPENSSH PRIVATE KEY-----\nb3BlbnNza...\n-----END OPENSSH PRIVATE KEY-----",
  "prefix": "/exports/market-data/"
}
Avoid embedding private keys or passwords directly in configuration files. Use environment variables or a secrets manager to inject credentials at deployment time.

Dataset Configuration (data_args)

The data_args fields are identical to all file-based sources. See Local File for the full reference. The file_pattern is evaluated relative to the prefix configured on the connection.

Required data_args

FieldTypeDescription
file_patternstringGlob pattern relative to the prefix, e.g., "*.csv"
data_dt_columnstring or listColumn(s) containing the datetime value
data_dt_formatstring or liststrptime format or special values ("muts", "uts", "datetime")
data_key_columnstring or listSymbol/key column(s)

Optional data_args

FieldTypeDefaultDescription
sep_overridestring","Delimiter character
encodingstring"utf-8"File encoding
data_dt_timezonestring"UTC"Source data timezone
fname_dt_regexstringRegex to extract date from filename
fname_dt_formatstringstrptime format for filename date
fname_dt_timezonestringTimezone of filename date
fname_dt_nudgeint0Microsecond offset for filename date
fname_dt_approx_secondsintApproximate seconds per file
arrow_sortlist["symbol", "muts"]Sort order
arrow_timestampbooltrueGenerate human-readable timestamp column

Complete Example

Connection

{
  "name": "sftp-trades-feed",
  "connection_type": "sftp",
  "host": "sftp.dataprovider.com",
  "port": 22,
  "user": "cq-ingest",
  "password": "vendorPassword123",
  "prefix": "/feeds/trades/"
}

Dataset

{
  "name": "vendor-trades",
  "connection": "sftp-trades-feed",
  "data_args": {
    "file_pattern": "trades_*.csv.gz",
    "sep_override": ",",
    "encoding": "utf-8",
    "data_dt_column": ["date", "time"],
    "data_dt_format": ["%Y%m%d", "%H:%M:%S.%f"],
    "data_dt_timezone": "America/New_York",
    "data_key_column": "symbol",
    "fname_dt_regex": "trades_(\\d{8})\\.csv\\.gz",
    "fname_dt_format": "%Y%m%d",
    "fname_dt_timezone": "America/New_York",
    "fname_dt_approx_seconds": 86400,
    "arrow_sort": ["symbol", "muts"],
    "arrow_timestamp": true
  },
  "schema": [
    { "name": "symbol", "type": "string", "group": "key", "description": "Ticker symbol" },
    { "name": "date", "type": "string", "group": "time", "description": "Trade date" },
    { "name": "time", "type": "string", "group": "time", "description": "Trade time" },
    { "name": "price", "type": "double", "group": "value", "description": "Trade price" },
    { "name": "size", "type": "int64", "group": "value", "description": "Trade size" },
    { "name": "condition", "type": "string", "group": "value", "description": "Sale condition code" }
  ]
}
CloudQuant Data Liberator supports reading gzip-compressed files (.csv.gz) transparently. Use compressed files on SFTP connections to reduce transfer time over slow or high-latency links.

Network Requirements

Ensure the following network connectivity from the CloudQuant Data Liberator host:
RequirementDetail
Outbound portTCP port 22 (or custom port) to the SFTP server
DNS resolutionThe hostname must resolve from the CloudQuant Data Liberator host
Firewall rulesWhitelist the CloudQuant Data Liberator host IP on the SFTP server
SSH host keyThe server’s host key must be trusted (added to known_hosts)