1. Support Center
  2. Python for Data Science Recipes

Aligning two datasets into one

Merging two different time series datasets into one can be like navigating a minefield, it's one of the trickiest challenges in data science. 

Even if your datasets are timestamped as Daily you still need to know when the data was available to ensure it can line up. 


Do both datasets contain the same symbols, what to do when they do not?


Is one 1 minute and one 5 minutes, how do I align those? 


Keeping the 1 minute as the index, When expanding the 5 minute data, do I take the first value for copied data, the last value or some other value.


If I merge the other way keeping the 5 minute data then the 1 minute bars have to be summarized. Do I take the average/max/min/other.. It all depends on the content of each of the columns.


Pandas has a lot of tricks and functions to help you merge datasets together. This could be a whole weeks course in data science but we can try to at least point you to some helpful commands…


df.reindex() - allows you to forward fill, back fill, use nearest or do None.


If both datasets are in Liberator then you can use the SuperQuery command to get liberator to bring them together for you (on the next page!).


If you are stuck, reach out, we do this kind of work daily and probably have something that can help you. 


https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reindex.html