Causal Revenue Analytics Engine - Period-over-Period Analysis

8 minute read

In this project I built an automated causal revenue analytics system that compares two time periods to identify why revenue changed, which products drove the change, and what actions to take. The system combines pandas-based decomposition with LLM-powered insights.

00. Project Overview
01. Two-Stage Architecture
02. Stage 1: Revenue Decomposition Engine
03. Stage 2: LLM Reporter
- Prompt Engineering
- Robust JSON Parsing
04. PVM Methodology Deep Dive
05. Quadrant Logic & Actions
06. Example Output
07. Configuring English Output
08. Next Steps

Project Overview

Business Problem

Finance and business intelligence teams need to answer: “Why did revenue change between two periods?”

Traditional approaches either:

Provide high-level totals without causality (dashboards showing +13M but not why)
Require manual SQL queries and spreadsheet analysis (time-consuming, error-prone)
Lack actionable recommendations (numbers without next steps)

The goal was to build an automated system that:

Decomposes revenue changes into Price, Volume, and Mix effects
Identifies which products drove each effect
Classifies products into decision quadrants (growth vs. decline)
Generates executive summaries in natural language
Provides product-level actions (keep, scale promo, delist, etc.)

Solution Architecture

The solution uses a two-stage pipeline:

Stage 1: Pandas Analyzer (analyzer_from_csv_v2.py)

Loads transaction data from CSV
Computes revenue bridge (appearing/disappearing/common products)
Performs PVM decomposition on common products
Classifies products into quadrants
Computes margin impacts and basket metrics
Outputs structured JSON

Stage 2: LLM Reporter (llm_v2.py)

Reads JSON from Stage 1
Uses Llama 3.1 (8B, quantized) to generate insights
Renders human-readable report with tables

Key Features

Revenue Bridge: Separates appearing, disappearing, and common products
PVM Decomposition: Price, Volume, and Mix effects
Quadrant Classification: D1-D4 based on price/volume dynamics
Customer Segmentation: Repeat vs. one-time buyer analysis
Margin Analysis: Net margin tracking at product level
LLM Insights: Natural language summaries and recommendations
Action Framework: 5 decision buckets (KEEP, PROMO, PRICE TEST, FIX, DELIST)

Two-Stage Architecture

Stage 1: Analyzer (Pure Computation)
    CSV Data
        ↓
    Revenue Bridge
    PVM Decomposition
    Quadrant Classification
        ↓
    llm_input_v2.json
        ↓
Stage 2: Reporter (LLM + Rendering)
    Load JSON
        ↓
    Generate Insights (LLM)
        ↓
    Render Report
        ↓
    weekly_summary_v2.txt

Why two stages?

Auditability: All math is deterministic (Stage 1)
Modularity: Can run Stage 1 alone for APIs
LLM Independence: Can swap or skip LLM
Performance: Stage 1 (~2-5s), Stage 2 (~10-15s)

Stage 1: Revenue Decomposition

Revenue Bridge Analysis

The bridge separates three product universes:

def compute_presence(ct2_pack_weekly: pd.DataFrame) -> pd.DataFrame:
    p = ct2_pack_weekly.copy()
    p["in_t1"] = (p["q1"] != 0) | (p["r1"] != 0)
    p["in_t2"] = (p["q2"] != 0) | (p["r2"] != 0)
    return p[["CT2_pack", "in_t1", "in_t2"]]

def compute_revenue_bridge_summary(ct2_pack_weekly, presence):
    w = ct2_pack_weekly.merge(presence, on="CT2_pack")

    # Appearing: only in T2
    appearing = w.loc[(~w["in_t1"]) & (w["in_t2"]), "r2"].sum()

    # Disappearing: only in T1
    disappearing = w.loc[(w["in_t1"]) & (~w["in_t2"]), "r1"].sum()

    # Common: in both periods
    common_mask = (w["in_t1"]) & (w["in_t2"])
    common_delta = (w.loc[common_mask, "r2"] - w.loc[common_mask, "r1"]).sum()

    total_delta = appearing - disappearing + common_delta

    return {"appearing": appearing,
            "disappearing": disappearing,
            "common_delta": common_delta,
            "total_delta": total_delta}

Bridge Formula:

ΔR_total = R_appearing - R_disappearing + ΔR_common

Price-Volume-Mix Decomposition

For common products, decompose revenue change:

def compute_common_pvm(ct2_pack_weekly, presence):
    common = ct2_pack_weekly.merge(presence)
    common = common[(common["in_t1"]) & (common["in_t2"])]

    # Priced products (can compute unit prices)
    priced = common[(common["q1"] != 0) & (common["q2"] != 0) &
                    common["p1"].notna() & common["p2"].notna()]

    # Volume effect: p1 * (q2 - q1)
    volume = (priced["p1"] * (priced["q2"] - priced["q1"])).sum()

    # Price effect: q2 * (p2 - p1)
    price = (priced["q2"] * (priced["p2"] - priced["p1"])).sum()

    # Mix: residual
    delta_common = (common["r2"] - common["r1"]).sum()
    mix = delta_common - volume - price

    return {"volume": volume, "price": price, "mix": mix}

PVM Formula:

ΔR_common = Volume + Price + Mix

Volume = Σ[p1 * (q2 - q1)]  (quantity change at T1 price)
Price  = Σ[q2 * (p2 - p1)]  (price change at T2 quantity)
Mix    = residual           (product mix + unpriced)

Quadrant Classification

Classify products by price/volume direction:

def compute_pack_quadrant(ct2_pack_weekly):
    w = ct2_pack_weekly.copy()

    def quad(row):
        p1, p2, q1, q2 = row["p1"], row["p2"], row["q1"], row["q2"]

        if pd.isna(p1) or pd.isna(p2):
            return "Neutral"

        if (p2 > p1) and (q2 > q1): return "D1: Price↑ Volume↑"
        if (p2 > p1) and (q2 < q1): return "D2: Price↑ Volume↓"
        if (p2 < p1) and (q2 > q1): return "D3: Price↓ Volume↑"
        if (p2 < p1) and (q2 < q1): return "D4: Price↓ Volume↓"
        return "Neutral"

    w["quadrant"] = w.apply(quad, axis=1)
    return w

Quadrants:

Code	Price	Volume	Meaning	Action
D1	↑	↑	Growth	KEEP/SCALE
D2	↑	↓	Margin risk	PRICE TEST
D3	↓	↑	Promo success	SCALE PROMO
D4	↓	↓	Double hit	FIX/DELIST

Stage 2: LLM Reporter

Prompt Engineering

The LLM generates structured insights:

def llama_story_v2(llm, llm_input):
    system_message = (
        "You are a senior performance analyst. "
        "Write a weekly causal summary in ENGLISH.\n\n"
        "STRICT RULES:\n"
        "1) Use ONLY numbers from the input\n"
        "2) Do NOT invent metrics\n"
        "3) Output ONLY valid JSON\n"
        "4) Focus on: WHAT (ΔR), WHY (PVM), WHERE (Quadrants), ACTIONS\n"
    )

    user_message = (
        "Output JSON with these fields:\n"
        '{"headline": string, "executive_summary": [string], '
        '"story": string, "top_findings": [string], '
        '"recommended_actions": [string]}\n\n'
        "INPUT:\n" + json.dumps(payload)
    )

    resp = llm.create_chat_completion(
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_message}
        ],
        temperature=0.0,
        max_tokens=1400
    )

    return json_loads_robust(resp["choices"][0]["message"]["content"])

Prompt Techniques:

Structured Output: Exact JSON schema
Constraints: “Use ONLY input numbers” prevents hallucination
Focus: “WHAT, WHY, WHERE, ACTIONS”
Temperature: 0.0 for reproducibility

Robust JSON Parsing

Handle LLM output reliably:

def json_loads_robust(text):
    try:
        return json.loads(text)
    except:
        # Extract first complete JSON object
        extracted = extract_first_json_object(text)
        if extracted:
            return json.loads(extracted)
        # Last resort: repair with LLM
        return repair_json_with_llm(llm, text)

def extract_first_json_object(text):
    start = text.find("{")
    if start < 0: return None

    depth = 0
    for i in range(start, len(text)):
        if text[i] == "{": depth += 1
        elif text[i] == "}":
            depth -= 1
            if depth == 0:
                return text[start:i+1]
    return None

PVM Methodology Deep Dive

Price-Volume-Mix decomposition requires careful universe definition:

Universe Types

Common Products (Bridge):

Has any revenue/quantity in T1 AND T2
Ensures bridge reconciliation

Priced Products (PVM):

Has non-zero quantity in T1 AND T2
Unit prices p1, p2 computable
Gets Volume and Price effects

Unpriced Products:

Common but q1=0 or q2=0
Goes into Mix effect

Volume Effect

Answers: “At old prices, what would new quantity generate?”

volume_effect = p1 * (q2 - q1)

Positive Volume: Sold more units (good) Negative Volume: Sold fewer units (concerning)

Price Effect

Answers: “At new quantity, what did price change contribute?”

price_effect = q2 * (p2 - p1)

Positive Price: Higher prices (margin opportunity) Negative Price: Lower prices (promotional/competitive)

Mix Effect

Residual capturing:

Product mix shifts
Unpriced product changes
Cross-effects

mix_effect = delta_common - volume - price

Typically 5-15% of total ΔR.

Quadrant Logic & Actions

D1: Price↑ Volume↑ (Growth Zone)

Meaning: More units at higher prices Cause: Strong product-market fit Action: KEEP / PRICE-UP / SCALE

if pe > 0 and ve > 0:
    if delta_margin > 0:
        return "KEEP / PRICE-UP"
    return "KEEP (watch margin)"

D2: Price↑ Volume↓ (Margin Risk)

Meaning: Higher price, fewer units Cause: Aggressive pricing or competitor pressure Action: PRICE DOWN TEST

Warning: If margin is negative, price increase backfired.

D3: Price↓ Volume↑ (Promotional)

Meaning: Lower price driving volume Cause: Active promotions Action: PROMO SCALE (if margin positive) or PROMO STOP (if negative)

D4: Price↓ Volume↓ (Double Hit)

Meaning: Lower price AND fewer units Cause: Product becoming obsolete or availability issues Action: FIX AVAILABILITY or DELIST

if pe < 0 and ve < 0:
    if in_stock_flag == 0:
        return "FIX AVAILABILITY"
    return "DELIST"

Example Output

Running on Nov 2025 vs Dec 2025 data:

Weekly Causal Summary

Periods: t1 = 2025-11-01 – 2025-11-30 | t2 = 2025-12-01 – 2025-12-31

Summary
 - Total revenue: t1 = 204,382,935 Ft
                  t2 = 217,341,392 Ft
                  → Δ = 12,958,457 Ft

Executive Summary
 Revenue grew by 13M Ft driven by common products (+12M). Volume effect
 dominated at +14M Ft (order count +917), offset by Price (-1.4M) and Mix
 (-0.3M). Growth zone (D1) led with 153 products. Repeat buyers contributed
 87% of volume growth. Top product: Biofinity (6db) at +3.1M Ft.

Revenue Bridge
 - Appearing (t2-only): 35,511,503 Ft
 - Disappearing (t1-only): 34,924,704 Ft
 - Common products Δ: 12,371,659 Ft
 - Total Δ: 12,958,458 Ft

PVM Decomposition
 - ΔR (common): 12,371,659 Ft
 - Volume effect: 14,094,044 Ft  [(q2−q1) × p1]
 - Price effect: -1,383,767 Ft  [q2 × (p2−p1)]
 - Mix effect: -338,617 Ft      [residual]

Volume Breakdown
 - Order count: t1=12,252, t2=13,169, Δ=917
 - Basket size: t1=2.704, t2=2.689, Δ=-0.015
 - Interpretation: order-driven growth

 - Repeat buyers: 86.8% of volume effect
   Top products: Avaira Vitality (6db) +5M, Biofinity (6db) +3M

 - One-time buyers: 13.2% of volume effect
   Top products: RB_4165_60171 +468K

Quadrant Analysis
 - D1 (Growth Zone): 153 products, +12.9M Ft impact
   Top: Biofinity (6db) +3.1M
   Action: KEEP / PRICE-UP

 - D2 (Margin Risk): 124 products, -7.0M Ft impact
   Top: Biofinity (3db) -1.0M
   Action: PRICE DOWN TEST

 - D3 (Promotional): 122 products, +12.5M Ft impact
   Top: Avaira Vitality (6db) +3.8M
   Action: PROMO SCALE

 - D4 (Double Hit): 94 products, -5.7M Ft impact
   Top: Dailies Total 1 (90db) -1.3M
   Action: DELIST / FIX AVAILABILITY

Key Insights:

Volume drove growth (+14M), Price slightly negative
Repeat buyers = 87% of volume growth
D1 (growth zone) had most products and impact
Top single product: Biofinity 6-pack (+3.1M Ft)

Configuring English Output

The system was originally Hungarian. To produce English:

Step 1: Update LLM prompts

In llm_v2.py, modify:

system_message = (
    "You are a senior performance analyst. "
    "Write a weekly causal summary in ENGLISH.\n"  # ← Add this
    # ...
)

Step 2: Translate static text

In render_human_report():

lines.append("Weekly Causal Summary")     # was: "Heti összefoglaló"
lines.append("Summary")                   # was: "Összefoglaló"
lines.append("Revenue Bridge")            # already English
lines.append("Volume effect explanation") # was: "Volume-hatás magyarázat"

Step 3: Update column headers

headers = ["Product", "Price", "Volume", "Impact", "Margin Δ", "Action"]
# was: ["CT2_pack", "Ár", "Mennyiség", "Impact", "Margin Δ", "Akció"]

Result: Native English output without manual translation.

Next Steps

Current Limitations:

Two-period only: No trend analysis
Manual date selection: User specifies T1/T2
Static inventory: Snapshot only
Text output: No visualizations

Planned Enhancements:

Multi-Period Tracking
- Store historical results in database
- Trend detection: “3rd month of D2 decline”
- Seasonality analysis
Automated Period Detection
- Auto-select recent periods
- Support WoW, MoM, YoY comparisons
Inventory Time-Series
- Daily snapshots
- Availability impact: “Out of stock 12 days in T2”
- Forecast integration
Visualization Layer
- Waterfall charts (PVM)
- Quadrant scatter plots
- Product performance heatmaps
Action Tracking
- Log recommendations with timestamps
- Track implementation and measure impact
- Close the feedback loop
Interactive Dashboard
- Streamlit UI for business users
- Drill-down capabilities
- Export to PowerPoint/PDF
Alerting System
- Email for D4 products above threshold
- Slack for significant PVM swings
- Proactive weekly summaries

This project demonstrates how deterministic analytics (PVM) combined with LLM reasoning can automate complex business intelligence while maintaining auditability.

Laszlo Feher

Data Science Portfolio