ML Infrastructure

Real-Time Portfolio Risk Analytics

A streaming financial risk platform: Kafka-driven market data ingestion, Apache Spark Structured Streaming for rolling VaR calculations, FastAPI for risk query endpoints, and a Streamlit dashboard with live P&L and exposure breakdowns. Built to process multi-asset portfolios with sub-second risk metric updates.

Sub-second VaR updates Multi-asset portfolios Historical simulation VaR Live P&L tracking REST risk API Streamlit dashboard
Apache Kafka Apache Spark FastAPI Streamlit Python Docker Pandas NumPy PostgreSQL Redis
<1s VaR update latency
500+ price ticks/sec
95% VaR confidence level
30-day historical window

The Problem

Traditional risk systems compute Value at Risk (VaR) overnight. A portfolio manager running an intraday position change doesn't know their updated risk exposure until the next morning. In volatile markets, that lag is not a workflow inconvenience - it's a risk management failure.

The secondary problem is architecture: financial data arrives as a firehose of price ticks. Computing portfolio-level VaR from streaming tick data requires joining live prices against position records, maintaining rolling historical windows, and running Monte Carlo or historical simulation on each update. Doing this in a batch system is fundamentally the wrong tool. Streaming with Spark changes the problem into a solvable one.

System Architecture

The pipeline has four distinct layers, with Kafka as the central nervous system.

Market Data Ingestion
Price Tick Simulator
Kafka: market_data
Risk Computation
Spark Structured Streaming
Rolling VaR Engine
Storage + API Layer
PostgreSQL (risk snapshots)
Redis (live position cache)
FastAPI (risk endpoints)
Visualization
Streamlit Dashboard
P&L + Exposure Breakdowns

Market data ingestion

A Kafka producer simulates realistic equity price ticks: mid-price with configurable bid-ask spread, volume weighting, and random walk drift. Produces to a market_data topic partitioned by ticker symbol for ordering guarantees within each instrument. Tested to 500+ ticks/second sustained.

Streaming VaR computation

Spark Structured Streaming joins live price ticks against a position snapshot loaded at startup. For each 5-second micro-batch, it computes mark-to-market P&L, updates the rolling 30-day return history, and recalculates Historical Simulation VaR at 95% confidence. Results are written to the risk_snapshots Kafka topic and persisted to PostgreSQL.

API and dashboard

FastAPI exposes risk query endpoints: current VaR by portfolio or asset class, historical VaR timeseries, position-level attribution, and scenario analysis. Redis caches the most recent risk snapshot per portfolio for sub-millisecond API response times. Streamlit polls the API every 2 seconds and renders live P&L curves, VaR gauges, and exposure heatmaps.

VaR Methodology: Historical Simulation

Historical Simulation VaR was chosen over parametric (variance-covariance) and Monte Carlo for a deliberate reason: it makes no distributional assumptions. Equity returns are fat-tailed and skewed. A parametric model that assumes normality will systematically underestimate tail risk. Historical simulation uses the actual observed return distribution, so the 2008 crash, the 2020 COVID selloff, and the 2022 rate shock all live in the historical window and inform the VaR calculation.

The implementation maintains a 30-day rolling window of daily returns per instrument. On each Spark micro-batch, it recomputes the P&L distribution by applying historical returns to the current position, sorts the distribution, and reads the 5th percentile as the 95% 1-day VaR. Component VaR is computed per asset, which allows attribution: which positions are contributing most to portfolio tail risk.

VaR timeseries chart - 30-day rolling 95% VaR vs realized P&L, showing backtesting accuracy with VaR breaches marked. Asset-level VaR attribution heatmap on the right.

Live Risk Dashboard

The Streamlit dashboard refreshes every 2 seconds and renders four panels:

  • Portfolio P&L curve - intraday cumulative P&L with entry price baseline
  • VaR gauge - current 1-day 95% VaR as percentage of portfolio NAV, with a red threshold line at 2%
  • Exposure breakdown - long/short gross exposure by asset class (equities, ETFs, futures)
  • Risk contributors - top 5 positions by component VaR contribution, sortable by delta or VaR
Streamlit dashboard: P&L curve (top left), VaR gauge at 1.7% NAV (top right), asset class exposure bar chart (bottom left), top risk contributors table (bottom right). Auto-refresh every 2 seconds.

Honest Assessment: What Worked and What Didn't

What worked

Kafka partitioning by ticker. Partitioning the market data topic by symbol guaranteed ordered delivery per instrument. This made the Spark join against position data deterministic - no race conditions between out-of-order price ticks for the same ticker.

What failed

Historical simulation window size. A 30-day window is too short to capture low-frequency stress events. In calm market periods, the VaR estimate will be systematically too low. Needed a secondary stressed VaR calculation using a 2008 or 2020 scenario window - didn't build this.

What worked

Redis position cache. Loading position data once at startup and caching in Redis eliminated repeated PostgreSQL reads on every Spark micro-batch. API response times stayed sub-millisecond even at peak tick rates.

What failed

Streamlit polling architecture. Polling the FastAPI every 2 seconds creates unnecessary load and causes visible refresh lag. WebSockets or server-sent events would push updates only when new VaR snapshots arrive, eliminating the polling overhead entirely.

What worked

Component VaR attribution. Computing per-asset component VaR made the dashboard useful beyond just a total number. Seeing that a single position accounts for 40% of portfolio VaR is actionable - just seeing total VaR isn't.

What failed

Spark local mode for development. Running Spark in local mode masked memory and executor issues that appeared in distributed mode. Should have used a Docker Compose Spark cluster from day one - switching modes late in development required reconfiguring checkpoint paths and memory settings.

What I Would Build Next

  • Stressed VaR: secondary calculation using 2008, 2020, and 2022 historical stress periods as scenario overlays
  • WebSocket push: replace Streamlit polling with SSE or WebSocket from FastAPI, driving real-time dashboard updates on each Spark micro-batch completion
  • Greeks for options: extend the position model to include delta, gamma, and vega exposure if options are in the portfolio - changes VaR computation significantly
  • Backtesting harness: replay 6-month historical data through the VaR engine and count breaches, validate the 95% confidence level is actually 95% over time
  • Multi-currency support: normalize positions to a base currency using FX rates from a secondary Kafka topic, enabling cross-currency portfolio risk