A real-time ML pipeline for financial fraud detection: Kafka-driven transaction ingestion, PySpark feature engineering, MLflow model lifecycle management, FastAPI scoring at sub-ms P99 latency, and a Grafana observability stack. Built to handle 100+ transactions per second with a full feedback loop from detection to model retraining.
Most fraud detection systems are batch-oriented: transactions are scored hours after they occur, by which time the fraudulent activity has already cascaded. Real-time scoring requires a completely different architecture - one where latency is measured in milliseconds, throughput in hundreds of transactions per second, and the model feedback loop runs continuously.
The secondary problem is model drift. Fraud patterns shift fast. A model trained three months ago has already been partially circumvented by new attack vectors. The system needed to support rapid retraining, A/B testing between model versions, and rollback without downtime.
The pipeline has five distinct layers, each with a clear boundary and independent failure mode.
A Kafka producer simulates a realistic transaction stream: varying amounts, merchant categories,
geolocation signals, and behavioral anomalies. Transactions flow into a raw_transactions
topic at configurable TPS (tested to 100+ sustained). Partitioned by account ID for ordering guarantees.
PySpark Structured Streaming consumes the raw topic and computes derived features:
rolling velocity (transactions in last 5 min / 1 hr), merchant category risk score,
geographic anomaly flag, amount deviation from 30-day mean. Output goes to a
feature_vectors topic.
FastAPI loads the production model from MLflow at startup and keeps it in memory. No network hop per prediction - sub-ms P99 latency achieved by eliminating the model-loading roundtrip. Model swaps happen via a reload endpoint that pulls the new production model without dropping existing connections.
Every training run is tracked in MLflow: hyperparameters, dataset version, evaluation metrics, and the serialized model artifact. Models graduate through three stages in the MLflow Registry: Staging, Production, Archived. The Airflow retraining DAG runs nightly, compares the new model against the current Production baseline on a holdout set, and only promotes if recall improves and false positive rate stays below 2%.
This gave a meaningful CI/CD pattern for models: new versions are tested before they serve traffic, rollback is a single registry stage change, and experiment history is fully reproducible from any commit SHA.
Prometheus scrapes three endpoints: the FastAPI scoring service (prediction count, latency histogram, fraud rate), the Kafka consumer (consumer lag, processing time), and the Airflow scheduler (DAG run durations, task failures). Grafana pulls from Prometheus and renders four dashboards:
In-memory model serving. Loading the MLflow model at FastAPI startup and keeping it in RAM was the single biggest latency win. Eliminated 15-40ms per prediction compared to loading per-request.
Schema evolution. Changing the feature schema mid-stream broke the PySpark consumer. Needed a schema registry (Confluent or AWS Glue) to version feature contracts properly.
MLflow model staging. The Staging - Production - Archived lifecycle prevented accidental promotion of underperforming models and gave clean rollback at any point.
Spark executor memory. Under sustained 100+ TPS, PySpark GC pressure caused processing spikes. Fixed by tuning spark.executor.memory, spark.sql.shuffle.partitions, and switching from default GC to G1GC.
Airflow LocalExecutor. Dropped the CeleryExecutor for local dev. Eliminated Redis + Celery dependencies and cut environment setup time from 20 minutes to under 5.
Kafka consumer group rebalancing. Under rapid producer scale-up, consumer group rebalancing caused processing gaps up to 8 seconds. Mitigated with session.timeout.ms and heartbeat.interval.ms tuning, but not fully resolved.