A streaming financial risk platform: Kafka-driven market data ingestion, Apache Spark Structured Streaming for rolling VaR calculations, FastAPI for risk query endpoints, and a Streamlit dashboard with live P&L and exposure breakdowns. Built to process multi-asset portfolios with sub-second risk metric updates.
Traditional risk systems compute Value at Risk (VaR) overnight. A portfolio manager running an intraday position change doesn't know their updated risk exposure until the next morning. In volatile markets, that lag is not a workflow inconvenience - it's a risk management failure.
The secondary problem is architecture: financial data arrives as a firehose of price ticks. Computing portfolio-level VaR from streaming tick data requires joining live prices against position records, maintaining rolling historical windows, and running Monte Carlo or historical simulation on each update. Doing this in a batch system is fundamentally the wrong tool. Streaming with Spark changes the problem into a solvable one.
The pipeline has four distinct layers, with Kafka as the central nervous system.
A Kafka producer simulates realistic equity price ticks: mid-price with configurable
bid-ask spread, volume weighting, and random walk drift. Produces to a
market_data topic partitioned by ticker symbol for ordering guarantees
within each instrument. Tested to 500+ ticks/second sustained.
Spark Structured Streaming joins live price ticks against a position snapshot loaded
at startup. For each 5-second micro-batch, it computes mark-to-market P&L, updates
the rolling 30-day return history, and recalculates Historical Simulation VaR at 95%
confidence. Results are written to the risk_snapshots Kafka topic and
persisted to PostgreSQL.
FastAPI exposes risk query endpoints: current VaR by portfolio or asset class, historical VaR timeseries, position-level attribution, and scenario analysis. Redis caches the most recent risk snapshot per portfolio for sub-millisecond API response times. Streamlit polls the API every 2 seconds and renders live P&L curves, VaR gauges, and exposure heatmaps.
Historical Simulation VaR was chosen over parametric (variance-covariance) and Monte Carlo for a deliberate reason: it makes no distributional assumptions. Equity returns are fat-tailed and skewed. A parametric model that assumes normality will systematically underestimate tail risk. Historical simulation uses the actual observed return distribution, so the 2008 crash, the 2020 COVID selloff, and the 2022 rate shock all live in the historical window and inform the VaR calculation.
The implementation maintains a 30-day rolling window of daily returns per instrument. On each Spark micro-batch, it recomputes the P&L distribution by applying historical returns to the current position, sorts the distribution, and reads the 5th percentile as the 95% 1-day VaR. Component VaR is computed per asset, which allows attribution: which positions are contributing most to portfolio tail risk.
The Streamlit dashboard refreshes every 2 seconds and renders four panels:
Kafka partitioning by ticker. Partitioning the market data topic by symbol guaranteed ordered delivery per instrument. This made the Spark join against position data deterministic - no race conditions between out-of-order price ticks for the same ticker.
Historical simulation window size. A 30-day window is too short to capture low-frequency stress events. In calm market periods, the VaR estimate will be systematically too low. Needed a secondary stressed VaR calculation using a 2008 or 2020 scenario window - didn't build this.
Redis position cache. Loading position data once at startup and caching in Redis eliminated repeated PostgreSQL reads on every Spark micro-batch. API response times stayed sub-millisecond even at peak tick rates.
Streamlit polling architecture. Polling the FastAPI every 2 seconds creates unnecessary load and causes visible refresh lag. WebSockets or server-sent events would push updates only when new VaR snapshots arrive, eliminating the polling overhead entirely.
Component VaR attribution. Computing per-asset component VaR made the dashboard useful beyond just a total number. Seeing that a single position accounts for 40% of portfolio VaR is actionable - just seeing total VaR isn't.
Spark local mode for development. Running Spark in local mode masked memory and executor issues that appeared in distributed mode. Should have used a Docker Compose Spark cluster from day one - switching modes late in development required reconfiguring checkpoint paths and memory settings.