Understanding time in datasets
Every Liberator dataset has two columns representing the same time in different formats.
- timestamp: string format timestamp in America/New York Time.
- muts: Microseconds Unix Timestamp. An integer representation of a timestamp with microsecond precision that can be compared directly to other timestamps.
These fields are filled in for every row of data. This is never null.
The Earliest Time The Row Of Data Could Have Been Known
These time attributes represent the earliest possible time that the given row of data could have been known.
Often datasets have only a date column. For example, you regularly see daily bar data with a column that represents the day that the Open, High, Low, and Close values for a given security. The data might look like this:
day | symbol | open | high | low | close |
1/2/2020 | AAPL | $ 296.24 | $ 300.60 | $ 295.19 | $ 300.35 |
1/3/2020 | AAPL | $ 297.15 | $ 300.58 | $ 296.50 | $ 297.43 |
1/7/2020 | AAPL | $ 299.84 | $ 300.90 | $ 297.48 | $ 298.39 |
1/8/2020 | AAPL | $ 297.16 | $ 304.44 | $ 297.16 | $ 303.19 |
1/9/2020 | AAPL | $ 307.23 | $ 310.43 | $ 306.20 | $ 309.63 |
In this case, the earliest time that the high, low, and close values could have been known was shortly after the market closed at 4:00 pm (16:00) New York Time. Closing prints often take up to 5 minutes to be reported by the exchange. Therefore CloudQuant will typically set (a.k.a nudge) the timestamp as the day column + 16:05 hours.
Source of Timestamps and Muts
Timestamps come from multiple sources.
- The row of data itself (as shown above.)
- The time the row of data was recorded when sent to CloudQuant's Liberator using a streaming API
- A File timestamp from the underlying file that feeds Liberator
- A database timestamp
- The download completion time