> ## Documentation Index
> Fetch the complete documentation index at: https://knowledge.cloudquant.com/llms.txt
> Use this file to discover all available pages before exploring further.

# A Merge or Join

> Combine two DataFrames by columns using pandas merge, join, and merge_asof operations.

# A merge or join

A merge or join operation expands the number of **columns** in a resulting DataFrame by combining data from two tables based on shared values. This differs from concatenation, which combines **rows**.

## Merge vs. concat

| Operation      | What it does                                                                                                         |
| -------------- | -------------------------------------------------------------------------------------------------------------------- |
| **CONCAT**     | Combines rows from two DataFrames (e.g., two 10-row DataFrames yield 20 rows). Generally requires identical columns. |
| **MERGE/JOIN** | Combines columns by matching rows on a shared column, typically increasing column count.                             |

## pd.merge()

Merging requires a **left** dataset, a **right** dataset, and a common column specified with the `on` parameter.

### Join types

The `how` parameter controls which rows appear in results:

| Join Type | Behavior                                                                 |
| --------- | ------------------------------------------------------------------------ |
| `inner`   | **(DEFAULT)** Only rows where the merge column exists in both DataFrames |
| `left`    | All rows from the left DataFrame; `NaN` where right data is missing      |
| `right`   | All rows from the right DataFrame; `NaN` where left data is missing      |
| `outer`   | All rows from both DataFrames; `NaN` where matches do not exist          |

<Tip>
  Use `indicator=True` to add a column showing the merge source (`'both'`, `'left_only'`, `'right_only'`) — especially useful when debugging outer joins.
</Tip>

### Basic examples

```python theme={null}
# Standard merge on a shared column
res = pd.merge(left, right, on='shared_column')

# Left join
res = pd.merge(left, right, on='shared_column', how='left')

# Merge with different column names
res = pd.merge(left, right, left_on='leftcol2',
               right_on='rightcol7', how='left')

# Select specific columns from right before merging
res = pd.merge(left, right[['col1', 'col2', 'col3']],
               on='shared_column')

# Check overlap before merging
left['datesym'].isin(right['datesym']).value_counts()
```

## merge\_asof()

`merge_asof` offers additional flexibility for time-based merging:

* Merging on multiple values
* Specifying tolerance thresholds

### Example: matching quotes to trades

```python theme={null}
pd.merge_asof(trades, quotes, on='time', by='ticker')

# With tolerance
pd.merge_asof(trades, quotes, on='time', by='ticker',
              tolerance=pd.Timedelta('2ms'))
```

<Note>
  If `merge_asof()` fails, try removing the `tolerance` parameter.
</Note>

## Reference

* [pandas.merge\_asof documentation](https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html)
