A Merge or Join
A merge or join operation expands the number of columns in a resulting DataFrame by combining data from two tables based on shared values. This differs from concatenation, which combines rows.
Merge vs. Concat
| Operation | What it does |
|---|
| CONCAT | Combines rows from two DataFrames (e.g., two 10-row DataFrames yield 20 rows). Generally requires identical columns. |
| MERGE/JOIN | Combines columns by matching rows on a shared column, typically increasing column count. |
pd.merge()
Merging requires a left dataset, a right dataset, and a common column specified with the on parameter.
Join Types
The how parameter controls which rows appear in results:
| Join Type | Behavior |
|---|
inner | (DEFAULT) Only rows where the merge column exists in both DataFrames |
left | All rows from the left DataFrame; NaN where right data is missing |
right | All rows from the right DataFrame; NaN where left data is missing |
outer | All rows from both DataFrames; NaN where matches do not exist |
Use indicator=True to add a column showing the merge source ('both', 'left_only', 'right_only') — especially useful when debugging outer joins.
Basic Examples
# Standard merge on a shared column
res = pd.merge(left, right, on='shared_column')
# Left join
res = pd.merge(left, right, on='shared_column', how='left')
# Merge with different column names
res = pd.merge(left, right, left_on='leftcol2',
right_on='rightcol7', how='left')
# Select specific columns from right before merging
res = pd.merge(left, right[['col1', 'col2', 'col3']],
on='shared_column')
# Check overlap before merging
left['datesym'].isin(right['datesym']).value_counts()
merge_asof()
merge_asof offers additional flexibility for time-based merging:
- Merging on multiple values
- Specifying tolerance thresholds
Example: Matching Quotes to Trades
pd.merge_asof(trades, quotes, on='time', by='ticker')
# With tolerance
pd.merge_asof(trades, quotes, on='time', by='ticker',
tolerance=pd.Timedelta('2ms'))
If merge_asof() fails, try removing the tolerance parameter.
Reference