Skip to main content

Summarizing Data with a Pivot Table

A pivot table is a data analysis tool that enables you to summarize, reorganize, and explore large datasets by aggregating data, rotating perspectives, filtering, sorting, and grouping information into meaningful categories.

Key Benefits

  • Simplify Complex Data — Break down large datasets into manageable summaries
  • Identify Trends — Reveal patterns not immediately obvious in raw data
  • Flexible Analysis — Quickly change layout and structure to answer different questions
  • Efficiency — Automate data analysis for faster, regular reporting

Pandas Pivot Table Syntax

import pandas as pd

pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'], aggfunc='sum')

Practical Example: Analyzing Odd Lots in AAPL Trades

This example demonstrates analyzing AAPL trade data by:
  1. Extracting trades between 9:30 AM and 4 PM
  2. Creating an odd_lots boolean column (shares < 100)
  3. Extracting hour from timestamp
  4. Pivoting to count shares by hour and lot size
  5. Calculating percentage of odd lots per hour
import liberator

%time df = liberator.get_dataframe(
    liberator.query(
        name='trades',
        symbols='AAPL',
        as_of='2024-07-30 15:59:58',
        back_to='2024-07-30 09:30:01'
    )
)

df['odd_lots'] = df['shares'] < 100

df['Hour'] = df.apply(lambda row: str(row.timestamp)[11:13], axis=1)

pivot_table = df.pivot_table(
    values='shares',
    index='Hour',
    columns='odd_lots',
    aggfunc='sum',
    fill_value=0
)

pivot_table['% Odd Lots'] = (
    pivot_table[True] / (pivot_table[True] + pivot_table[False]) * 100
)

pivot_table

Parameter Reference

ParameterPurpose
aggfuncSpecifies aggregation function ('sum', 'mean', etc.)
valuesTarget column for aggregation
indexColumn(s) defining row groupings
columnsColumn(s) defining column splits
fill_valueDefault value for missing data