Causal Revenue Analytics Engine - Period-over-Period Analysis
In this project I built an automated causal revenue analytics system that compares two time periods to identify why revenue changed, which products drove the change, and what actions to take. The system combines pandas-based decomposition with LLM-powered insights.
Table of contents
- 00. Project Overview
- 01. Two-Stage Architecture
- 02. Stage 1: Revenue Decomposition Engine
- 03. Stage 2: LLM Reporter
- 04. PVM Methodology Deep Dive
- 05. Quadrant Logic & Actions
- 06. Example Output
- 07. Configuring English Output
- 08. Next Steps
Project Overview
Business Problem
Finance and business intelligence teams need to answer: “Why did revenue change between two periods?”
Traditional approaches either:
- Provide high-level totals without causality (dashboards showing +13M but not why)
- Require manual SQL queries and spreadsheet analysis (time-consuming, error-prone)
- Lack actionable recommendations (numbers without next steps)
The goal was to build an automated system that:
- Decomposes revenue changes into Price, Volume, and Mix effects
- Identifies which products drove each effect
- Classifies products into decision quadrants (growth vs. decline)
- Generates executive summaries in natural language
- Provides product-level actions (keep, scale promo, delist, etc.)
Solution Architecture
The solution uses a two-stage pipeline:
Stage 1: Pandas Analyzer (analyzer_from_csv_v2.py)
- Loads transaction data from CSV
- Computes revenue bridge (appearing/disappearing/common products)
- Performs PVM decomposition on common products
- Classifies products into quadrants
- Computes margin impacts and basket metrics
- Outputs structured JSON
Stage 2: LLM Reporter (llm_v2.py)
- Reads JSON from Stage 1
- Uses Llama 3.1 (8B, quantized) to generate insights
- Renders human-readable report with tables
Key Features
- Revenue Bridge: Separates appearing, disappearing, and common products
- PVM Decomposition: Price, Volume, and Mix effects
- Quadrant Classification: D1-D4 based on price/volume dynamics
- Customer Segmentation: Repeat vs. one-time buyer analysis
- Margin Analysis: Net margin tracking at product level
- LLM Insights: Natural language summaries and recommendations
- Action Framework: 5 decision buckets (KEEP, PROMO, PRICE TEST, FIX, DELIST)
Two-Stage Architecture
Stage 1: Analyzer (Pure Computation)
CSV Data
↓
Revenue Bridge
PVM Decomposition
Quadrant Classification
↓
llm_input_v2.json
↓
Stage 2: Reporter (LLM + Rendering)
Load JSON
↓
Generate Insights (LLM)
↓
Render Report
↓
weekly_summary_v2.txt
Why two stages?
- Auditability: All math is deterministic (Stage 1)
- Modularity: Can run Stage 1 alone for APIs
- LLM Independence: Can swap or skip LLM
- Performance: Stage 1 (~2-5s), Stage 2 (~10-15s)
Stage 1: Revenue Decomposition
Revenue Bridge Analysis
The bridge separates three product universes:
def compute_presence(ct2_pack_weekly: pd.DataFrame) -> pd.DataFrame:
p = ct2_pack_weekly.copy()
p["in_t1"] = (p["q1"] != 0) | (p["r1"] != 0)
p["in_t2"] = (p["q2"] != 0) | (p["r2"] != 0)
return p[["CT2_pack", "in_t1", "in_t2"]]
def compute_revenue_bridge_summary(ct2_pack_weekly, presence):
w = ct2_pack_weekly.merge(presence, on="CT2_pack")
# Appearing: only in T2
appearing = w.loc[(~w["in_t1"]) & (w["in_t2"]), "r2"].sum()
# Disappearing: only in T1
disappearing = w.loc[(w["in_t1"]) & (~w["in_t2"]), "r1"].sum()
# Common: in both periods
common_mask = (w["in_t1"]) & (w["in_t2"])
common_delta = (w.loc[common_mask, "r2"] - w.loc[common_mask, "r1"]).sum()
total_delta = appearing - disappearing + common_delta
return {"appearing": appearing,
"disappearing": disappearing,
"common_delta": common_delta,
"total_delta": total_delta}
Bridge Formula:
ΔR_total = R_appearing - R_disappearing + ΔR_common
Price-Volume-Mix Decomposition
For common products, decompose revenue change:
def compute_common_pvm(ct2_pack_weekly, presence):
common = ct2_pack_weekly.merge(presence)
common = common[(common["in_t1"]) & (common["in_t2"])]
# Priced products (can compute unit prices)
priced = common[(common["q1"] != 0) & (common["q2"] != 0) &
common["p1"].notna() & common["p2"].notna()]
# Volume effect: p1 * (q2 - q1)
volume = (priced["p1"] * (priced["q2"] - priced["q1"])).sum()
# Price effect: q2 * (p2 - p1)
price = (priced["q2"] * (priced["p2"] - priced["p1"])).sum()
# Mix: residual
delta_common = (common["r2"] - common["r1"]).sum()
mix = delta_common - volume - price
return {"volume": volume, "price": price, "mix": mix}
PVM Formula:
ΔR_common = Volume + Price + Mix
Volume = Σ[p1 * (q2 - q1)] (quantity change at T1 price)
Price = Σ[q2 * (p2 - p1)] (price change at T2 quantity)
Mix = residual (product mix + unpriced)
Quadrant Classification
Classify products by price/volume direction:
def compute_pack_quadrant(ct2_pack_weekly):
w = ct2_pack_weekly.copy()
def quad(row):
p1, p2, q1, q2 = row["p1"], row["p2"], row["q1"], row["q2"]
if pd.isna(p1) or pd.isna(p2):
return "Neutral"
if (p2 > p1) and (q2 > q1): return "D1: Price↑ Volume↑"
if (p2 > p1) and (q2 < q1): return "D2: Price↑ Volume↓"
if (p2 < p1) and (q2 > q1): return "D3: Price↓ Volume↑"
if (p2 < p1) and (q2 < q1): return "D4: Price↓ Volume↓"
return "Neutral"
w["quadrant"] = w.apply(quad, axis=1)
return w
Quadrants:
| Code | Price | Volume | Meaning | Action |
|---|---|---|---|---|
| D1 | ↑ | ↑ | Growth | KEEP/SCALE |
| D2 | ↑ | ↓ | Margin risk | PRICE TEST |
| D3 | ↓ | ↑ | Promo success | SCALE PROMO |
| D4 | ↓ | ↓ | Double hit | FIX/DELIST |
Stage 2: LLM Reporter
Prompt Engineering
The LLM generates structured insights:
def llama_story_v2(llm, llm_input):
system_message = (
"You are a senior performance analyst. "
"Write a weekly causal summary in ENGLISH.\n\n"
"STRICT RULES:\n"
"1) Use ONLY numbers from the input\n"
"2) Do NOT invent metrics\n"
"3) Output ONLY valid JSON\n"
"4) Focus on: WHAT (ΔR), WHY (PVM), WHERE (Quadrants), ACTIONS\n"
)
user_message = (
"Output JSON with these fields:\n"
'{"headline": string, "executive_summary": [string], '
'"story": string, "top_findings": [string], '
'"recommended_actions": [string]}\n\n'
"INPUT:\n" + json.dumps(payload)
)
resp = llm.create_chat_completion(
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": user_message}
],
temperature=0.0,
max_tokens=1400
)
return json_loads_robust(resp["choices"][0]["message"]["content"])
Prompt Techniques:
- Structured Output: Exact JSON schema
- Constraints: “Use ONLY input numbers” prevents hallucination
- Focus: “WHAT, WHY, WHERE, ACTIONS”
- Temperature: 0.0 for reproducibility
Robust JSON Parsing
Handle LLM output reliably:
def json_loads_robust(text):
try:
return json.loads(text)
except:
# Extract first complete JSON object
extracted = extract_first_json_object(text)
if extracted:
return json.loads(extracted)
# Last resort: repair with LLM
return repair_json_with_llm(llm, text)
def extract_first_json_object(text):
start = text.find("{")
if start < 0: return None
depth = 0
for i in range(start, len(text)):
if text[i] == "{": depth += 1
elif text[i] == "}":
depth -= 1
if depth == 0:
return text[start:i+1]
return None
PVM Methodology Deep Dive
Price-Volume-Mix decomposition requires careful universe definition:
Universe Types
Common Products (Bridge):
- Has any revenue/quantity in T1 AND T2
- Ensures bridge reconciliation
Priced Products (PVM):
- Has non-zero quantity in T1 AND T2
- Unit prices p1, p2 computable
- Gets Volume and Price effects
Unpriced Products:
- Common but q1=0 or q2=0
- Goes into Mix effect
Volume Effect
Answers: “At old prices, what would new quantity generate?”
volume_effect = p1 * (q2 - q1)
Positive Volume: Sold more units (good) Negative Volume: Sold fewer units (concerning)
Price Effect
Answers: “At new quantity, what did price change contribute?”
price_effect = q2 * (p2 - p1)
Positive Price: Higher prices (margin opportunity) Negative Price: Lower prices (promotional/competitive)
Mix Effect
Residual capturing:
- Product mix shifts
- Unpriced product changes
- Cross-effects
mix_effect = delta_common - volume - price
Typically 5-15% of total ΔR.
Quadrant Logic & Actions
D1: Price↑ Volume↑ (Growth Zone)
Meaning: More units at higher prices Cause: Strong product-market fit Action: KEEP / PRICE-UP / SCALE
if pe > 0 and ve > 0:
if delta_margin > 0:
return "KEEP / PRICE-UP"
return "KEEP (watch margin)"
D2: Price↑ Volume↓ (Margin Risk)
Meaning: Higher price, fewer units Cause: Aggressive pricing or competitor pressure Action: PRICE DOWN TEST
Warning: If margin is negative, price increase backfired.
D3: Price↓ Volume↑ (Promotional)
Meaning: Lower price driving volume Cause: Active promotions Action: PROMO SCALE (if margin positive) or PROMO STOP (if negative)
D4: Price↓ Volume↓ (Double Hit)
Meaning: Lower price AND fewer units Cause: Product becoming obsolete or availability issues Action: FIX AVAILABILITY or DELIST
if pe < 0 and ve < 0:
if in_stock_flag == 0:
return "FIX AVAILABILITY"
return "DELIST"
Example Output
Running on Nov 2025 vs Dec 2025 data:
Weekly Causal Summary
Periods: t1 = 2025-11-01 – 2025-11-30 | t2 = 2025-12-01 – 2025-12-31
Summary
- Total revenue: t1 = 204,382,935 Ft
t2 = 217,341,392 Ft
→ Δ = 12,958,457 Ft
Executive Summary
Revenue grew by 13M Ft driven by common products (+12M). Volume effect
dominated at +14M Ft (order count +917), offset by Price (-1.4M) and Mix
(-0.3M). Growth zone (D1) led with 153 products. Repeat buyers contributed
87% of volume growth. Top product: Biofinity (6db) at +3.1M Ft.
Revenue Bridge
- Appearing (t2-only): 35,511,503 Ft
- Disappearing (t1-only): 34,924,704 Ft
- Common products Δ: 12,371,659 Ft
- Total Δ: 12,958,458 Ft
PVM Decomposition
- ΔR (common): 12,371,659 Ft
- Volume effect: 14,094,044 Ft [(q2−q1) × p1]
- Price effect: -1,383,767 Ft [q2 × (p2−p1)]
- Mix effect: -338,617 Ft [residual]
Volume Breakdown
- Order count: t1=12,252, t2=13,169, Δ=917
- Basket size: t1=2.704, t2=2.689, Δ=-0.015
- Interpretation: order-driven growth
- Repeat buyers: 86.8% of volume effect
Top products: Avaira Vitality (6db) +5M, Biofinity (6db) +3M
- One-time buyers: 13.2% of volume effect
Top products: RB_4165_60171 +468K
Quadrant Analysis
- D1 (Growth Zone): 153 products, +12.9M Ft impact
Top: Biofinity (6db) +3.1M
Action: KEEP / PRICE-UP
- D2 (Margin Risk): 124 products, -7.0M Ft impact
Top: Biofinity (3db) -1.0M
Action: PRICE DOWN TEST
- D3 (Promotional): 122 products, +12.5M Ft impact
Top: Avaira Vitality (6db) +3.8M
Action: PROMO SCALE
- D4 (Double Hit): 94 products, -5.7M Ft impact
Top: Dailies Total 1 (90db) -1.3M
Action: DELIST / FIX AVAILABILITY
Key Insights:
- Volume drove growth (+14M), Price slightly negative
- Repeat buyers = 87% of volume growth
- D1 (growth zone) had most products and impact
- Top single product: Biofinity 6-pack (+3.1M Ft)
Configuring English Output
The system was originally Hungarian. To produce English:
Step 1: Update LLM prompts
In llm_v2.py, modify:
system_message = (
"You are a senior performance analyst. "
"Write a weekly causal summary in ENGLISH.\n" # ← Add this
# ...
)
Step 2: Translate static text
In render_human_report():
lines.append("Weekly Causal Summary") # was: "Heti összefoglaló"
lines.append("Summary") # was: "Összefoglaló"
lines.append("Revenue Bridge") # already English
lines.append("Volume effect explanation") # was: "Volume-hatás magyarázat"
Step 3: Update column headers
headers = ["Product", "Price", "Volume", "Impact", "Margin Δ", "Action"]
# was: ["CT2_pack", "Ár", "Mennyiség", "Impact", "Margin Δ", "Akció"]
Result: Native English output without manual translation.
Next Steps
Current Limitations:
- Two-period only: No trend analysis
- Manual date selection: User specifies T1/T2
- Static inventory: Snapshot only
- Text output: No visualizations
Planned Enhancements:
- Multi-Period Tracking
- Store historical results in database
- Trend detection: “3rd month of D2 decline”
- Seasonality analysis
- Automated Period Detection
- Auto-select recent periods
- Support WoW, MoM, YoY comparisons
- Inventory Time-Series
- Daily snapshots
- Availability impact: “Out of stock 12 days in T2”
- Forecast integration
- Visualization Layer
- Waterfall charts (PVM)
- Quadrant scatter plots
- Product performance heatmaps
- Action Tracking
- Log recommendations with timestamps
- Track implementation and measure impact
- Close the feedback loop
- Interactive Dashboard
- Streamlit UI for business users
- Drill-down capabilities
- Export to PowerPoint/PDF
- Alerting System
- Email for D4 products above threshold
- Slack for significant PVM swings
- Proactive weekly summaries
This project demonstrates how deterministic analytics (PVM) combined with LLM reasoning can automate complex business intelligence while maintaining auditability.