Narendranath
Edara

I build AI systems that are actually in production - and I extract reusable engines from them so the next team doesn't rebuild what I already shipped.

Six years shipping data and AI systems across enterprise HR, food-tech, and fintech. Currently building the retrieval, workflow orchestration, and governed SQL layers that make AI useful in production - not just impressive in demos.

  • ExponentHR: Data platform engineering at scale - CI/CD ownership compressing deployment cycles 3 months to 14 days, CDC ETL -67% compute cost, payroll-critical AAG database automation on Azure.
  • AutoApply AI: end-to-end discover → tailor → apply → track workflow - Chrome MV3, FastAPI, multi-provider LLM routing, 11 ATS adapters, live on Fly.io.
  • Modular engines: extracted the tailoring core into tailor-resume (MCP server, PyPI, CLI, Streamlit) and the discovery layer into JobScout - platform thinking, not tool accumulation.

Six years building data and AI systems across enterprise, food-tech, and fintech.

Jul 2024 - Present

Data Engineer

ExponentHR · Addison, TX

Data platform engineering / CI/CD ownership

  • Compressed deployment cycles from 3 months to 14 days by owning CI/CD end-to-end through Azure DevOps, eliminating 11 weeks of cross-team idle time per release.
  • Reengineered CDC ETL from full-table reloads to incremental merge upserts: runtime 30min→8min, compute cost -67%.
  • Engineered one-click idempotent Azure DevOps pipeline for Contained AAG databases (restore, security, CDC, listener validation), eliminating ~1 hour per request across 20+ daily copy-downs.
AzureAzure DevOps Data PlatformCDC PythonSQL
Aug 2023 - Jul 2024

Data Engineer

Missouri University of Science and Technology · Rolla, MO

ML infrastructure / anomaly detection  ·  M.S. Data Science - GPA 4.0 - Dec 2023

  • Engineered Azure AI Anomaly Detector pipelines selecting optimal algorithms per time-series profile, achieving 95%+ detection accuracy - caught a production memory leak 4 hours before outage.
  • Implemented tunable alert thresholds, filtering ~250 weekly non-actionable P3 alerts; signal-to-noise improved from 1:5 to 1:1.2.
  • Migrated from static D-Series VMs to AKS with HPA: CPU utilization 12%→64%, nodes consolidated 20→4-8 dynamic, Azure spend cut $3,200/month.
Azure AIAKS/HPA KubernetesAnomaly DetectionPython
Jun 2023 - Aug 2023

Engineering Intern

C2FO · Leawood, KS

  • Analyzed B2B financial transaction patterns via SQL to identify user behavior trends and inform product prioritization across the preferred offers tool.
  • Authored data-driven PRDs that reduced development resource allocation time by 50% by eliminating spec ambiguity across engineering, design, and business stakeholders.
SQLData AnalysisPRD
Sep 2020 - Mar 2021

Business Intelligence Analyst | Supply

udaan.com · India

  • Built predictive demand forecasting models driving $4 million annual savings and 7% ROI improvement through optimized inventory allocation.
  • Achieved 99.3% fulfillment rate through statistical capacity planning managing end-to-end first-mile to last-mile operations.
  • Developed automated ETL pipelines for financial modeling dashboards in Power BI.
ForecastingPower BI SQLETLSupply Chain
Mar 2018 - Sep 2020

Business Analyst

Zomato · Hyderabad, India

  • Built real-time competitor analytics platform tracking pricing, delivery times, and coverage - fed pricing changes that contributed to 9% market share gain in contested metros.
  • Optimized search relevance with ranking models using contextual signals (time, location, cuisine affinity), improving search-to-conversion across millions of daily queries.
  • Built Elasticsearch enterprise search indexing 100K+ documents, reducing Support Desk volume ~80% through self-serve query resolution.
ElasticsearchSQL Ranking ModelsCompetitor Analytics

M.S. Data Science

Missouri University of Science and Technology · Rolla, MO

GPA 4.0 / 4.0 Jan 2022 - Dec 2023 Taylor & Francis Publication
DP-700 Fabric
DP-700 Fabric Data Engineer Associate Microsoft
AI-900
Azure AI Fundamentals AI-900 Microsoft
Databricks
Generative AI Fundamentals Databricks
Data Warehouse in Microsoft Fabric Applied Skills · Microsoft
SQL Advanced HackerRank
Certified Scrum Product Owner Scrum Alliance

Stack I work in daily.

Senior+ AI/ML platform tools, organized by domain. Hover or focus a tile for one-line context. Every tile is a tool I've shipped with — no aspirational filler.

AI, LLM & RAG

  • Anthropic SDKPython SDK for Claude API + tool use + prompt caching
  • ClaudePrimary LLM in production cascade (Sonnet 4.6 default)
  • GPT-4oOpenAI fallback in the multi-LLM cascade
  • Hugging FaceTransformers + sentence-embedding models for RAG
  • LangChainRAG orchestration where prompt-chain depth justifies it
  • MCPModel Context Protocol — shipped tailor-resume MCP server

Languages & Frameworks

  • PythonPrimary backend + ML language across every system
  • TypeScriptChrome MV3 extension + type-safe API contracts
  • SQLPostgres-flavored — query-tuning, partitions, EXPLAIN
  • FastAPIAsync Python backend on every shipped service
  • PydanticSchema validation at every API boundary
  • StreamlitInternal dashboards + Tailor-Resume UI

Data & Storage

  • PostgreSQLPrimary OLTP store — managed Postgres in prod
  • SQL ServerMicrosoft SQL Server — enterprise OLTP + T-SQL
  • pgvectorPostgres extension powering RAG embeddings
  • RedisSession cache + queue, Upstash serverless in prod
  • KafkaStreaming ingest in Portfolio Risk Analytics

Lakehouse & Compute

  • SparkStructured Streaming consumer for Kafka topics
  • DatabricksLakehouse platform — production ML + analytics workloads
  • SSISSQL Server Integration Services — ETL pipelines
  • Delta LakeVersioned, ACID-safe lakehouse tables on Spark
  • IcebergOpen table format — petabyte-scale data lakes, time travel

Cloud & Deployment

  • AWSS3 + IAM + CloudWatch on production deployments
  • AzureAzure Functions + Storage in prior enterprise work
  • DockerMulti-stage builds for every shipped service
  • KubernetesHave used — staging clusters for AutoApply experiments
  • Fly.ioAutoApply AI production region — IAD + DFW

Observability & Ops

  • PrometheusMetrics scraping in containerized services
  • GrafanaDashboards for ingest lag + provider latency
  • SentryBackend + extension error tracking on AutoApply
  • MLflowExperiment tracking for fraud-detection pipelines
  • AirflowDAG orchestration in fraud-detection ML pipeline
  • GitHub ActionsCI on every repo — lint, types, tests, deploy

Key Accomplishments

  1. Deployment velocity

    Cycles cut from 3 months to 14 days by owning CI/CD end-to-end through Azure DevOps. Eliminated 11 weeks of cross-team idle time per release.

  2. CDC ETL optimization

    Reengineered CDC ETL from full-table reloads to incremental merge upserts: runtime 30 min to under 8 min, compute cost -67%.

  3. Always-on platform

    AutoApply AI runs live on Fly.io - zero cold start, 9-page React dashboard, 40+ FastAPI endpoints in production.

  4. Four distribution surfaces

    tailor-resume ships as CLI, Streamlit app, MCP server, and PyPI package - from one shared engine.

Technical Skills

  1. Data platform

    LLM-integrated data platform on Azure with governance enforced at the generation layer. RLS/CLS access control baked into the data product, not bolted on after.

  2. Workflow stack

    Chrome MV3, FastAPI, PostgreSQL, and Redis orchestrate the full discover → tailor → apply → track lifecycle.

  3. ML infrastructure

    Azure AI anomaly detection - 95%+ accuracy, AKS/HPA migration cutting Azure spend $3,200/month. Signal-to-noise from 1:5 to 1:1.2.

  4. MCP protocol

    tailor-resume is an MCP server - any Claude Code session or MCP-aware agent calls it directly, zero integration work. 97M monthly SDK downloads.

Four production systems, each independently deployable.

Built end-to-end, tested in production, and modular enough that other teams can use the engines without adopting the whole stack.

Workflow Platform

AutoApply AI

discover → tailor → apply → track

40+ API endpoints 11 ATS adapters 6 LLM providers 9-page dashboard
FastAPIChrome MV3 PostgreSQLRedis Fly.ioReact
Reusable Engine

tailor-resume

Extracted core, ships independently. 190 tests. Any team can consume it without touching AutoApply AI.

pip install tailor-resume
MCP server - any agent calls it, zero integration
CLI - zero config, cloud features opt-in
Streamlit - browser UI for non-technical users
Discovery Layer

JobScout

Keeps discovery signal clean. Runs independently so the apply workflow never inherits stale or low-quality data.

130+ company career pages monitored
6 ATS platforms covered
Sponsorship-aware ranking built in
Continuous alerting, no manual polling
Shared infrastructure Claude · OpenAI · Gemini · Groq  ·  pgvector / RAG  ·  Multi-provider cascade  ·  MCP Protocol

AutoApply AI

40+
FastAPI endpoints — verified by counting @router decorators in the AutoApply backend.
11
ATS adapters — eleven content-script adapters (Greenhouse, Lever, Workday, Ashby, SmartRecruiters, iCIMS, BambooHR, Jobvite, Taleo, Recruitee, Workable) live in the extension's ats/ directory.
6
LLM providers — a cascading provider chain across Claude, GPT-4o, Kimi, Ollama, Gemini, and Groq, all defined in the backend's providers service.
190
automated tests — one hundred ninety pytest cases in the AutoApply backend, verifiable with pytest collect-only.

Applying for jobs is a ritual that eats weeks without improving outcomes. I rebuilt the process from scratch - Chrome MV3 extension detects the job form, tailors the resume using an LLM cascade across 6 providers, submits through 11 different ATS systems, and logs everything to a React dashboard. No manual copy-paste. No lost applications. Live on Fly.io.

FastAPIChrome MV3 PostgreSQLRedis pgvector · RAGReact
Read case study GitHub

tailor-resume

190
automated tests — one hundred ninety pytest cases covering the tailoring engine, verifiable with pytest collect-only.
4
distribution surfaces — four delivery surfaces from one shared core: PyPI package, MCP server hosted on Fly.io, Streamlit web app, and a command-line interface.
MCP
server live on Fly.io — model-context-protocol compatible endpoint discoverable by any MCP-aware agent at tailor-resume-mcp.fly.dev.

The tailoring engine inside AutoApply AI was too useful to keep locked inside one product. Friends wanted it. Agents could call it. So I extracted it properly - 190 tests, four distribution surfaces - and shipped it so any developer can pip install tailor-resume and any MCP-aware agent can call it directly, zero integration work required.

PythonMCP server StreamlitPyPI package LaTeXFly.io
Read case study GitHub

JobScout

130+
career pages monitored — one hundred thirty-plus company-specific scrapers configured in the JobScout target list.
6
ATS platforms — six applicant-tracking-system adapters: Greenhouse, Lever, Workday, Ashby, SmartRecruiters, and iCIMS.
95+
resumes tailored — ninety-five-plus per-role tailored resume PDFs generated from the master profile, each tracked back to the job opening it applied to.

Job boards are noisy by design - most listings are duplicated, aggregated, or irrelevant before you open them. I built JobScout to monitor 130+ company career pages directly, bypassing the middleman, with sponsorship-aware ranking built in. Clean input upstream means better applications downstream.

130+ companies6 ATS platforms ranking enginesponsorship-aware

Azure Platform Engineering

3mo→14d
deployment cycle — release cadence cut from three months to fourteen days by owning CI/CD end-to-end through Azure DevOps; eliminated about eleven weeks of cross-team idle per release.
-67%
CDC ETL compute — CDC ETL runtime cut from thirty minutes to under eight minutes after reengineering full-table reloads into incremental merge-upserts.
~1hr
saved per AAG copy-down — idempotent Azure DevOps pipeline for Contained Always-On Availability Groups (restore plus security plus CDC plus listener validation) eliminated about an hour of manual orchestration on each of twenty-plus daily copy-down requests.

Production data platform work at ExponentHR: CI/CD ownership compressed deployment from 3 months to 14 days, CDC ETL -67% compute cost, payroll-critical AAG database failover automation, idempotent Azure DevOps pipelines.

Azure DevOpsCDC ETL AAG failoverMicrosoft Fabric

ML Pipeline Projects

Fraud Detection ML Platform

Streaming fraud scoring with Kafka ingest, FastAPI + LightGBM model serving, MLflow experiment tracking, and Prometheus + Grafana observability. Full model lifecycle in containers.

Portfolio Risk Analytics

Real-time risk analytics: Kafka streaming ingest, Spark historical-simulation VaR, FastAPI + Streamlit dashboard for live P&L visibility.

Portfolio-Risk topology: Kafka producer to Spark Structured Streaming to console sink, FastAPI VaR served to Streamlit Portfolio-Risk real-time topology. A Kafka producer publishes synthetic market ticks at fifty messages per second on one topic with five partitions. Spark Structured Streaming consumes them with five-second tumbling windows and ten-second watermarks, running on a single-node local cluster. The current sink is stdout (a demo console writer) — a CSV file sink to data/processed is the next planned step but is not yet wired. A FastAPI service computes historical-simulation VaR at ninety-five and ninety-nine percent confidence intervals; until the file sink exists it falls back to synthetic data. The dashboard is a local Streamlit app. Kafka producer 50 TPS 1 topic / 5 parts acks=all, gzip Spark Streaming structured streaming 5s tumbling window 10s watermark local[2] demo Console sink stdout (demo) CSV sink TODO FastAPI numpy VaR 95% / 99% CI historical simulation Streamlit local dev Single-node demo. Spark prints aggregates to stdout; CSV file handoff to FastAPI is planned, not shipped.
Figure: Portfolio-Risk topology, as it actually runs today. The Spark-to-FastAPI link is currently console-only; FastAPI falls back to synthetic data until the CSV sink lands.

FinTune — Financial NLP

QLoRA fine-tuning on Mistral-7B, 4-bit quantized inference, PII guardrails, circuit breaker self-recovery, KL-divergence drift detection.

repo-context-hooks

330+
automated tests — three hundred thirty-plus pytest cases passing as of the v1.0 supply-chain hardening release; verifiable with pytest collect-only.
Sigstore + CodeQL
supply-chain hardening — every release signed via Sigstore through GitHub Actions Trusted Publisher OIDC, plus CodeQL static analysis on every pull request; verifiable in the publish and codeql workflow files.
stdlib
only — zero runtime deps — the project dependencies array in pyproject.toml is empty; the package uses Python standard library only at runtime.

Pre/post-commit hooks that snapshot repo context for LLM agents. Sigstore-signed PyPI releases via OIDC Trusted Publisher; CodeQL static analysis on every PR; Dependabot tracks the GitHub Actions ecosystem.

Technical write-ups.

Problem, constraints, design, tradeoffs, outcome. Each post focuses on what didn't work and why.

Supply-chain audit pipeline: PR triggers five Claude agents in parallel feeding one consensus report. A pull request enters from the left, fans out to five parallel agent reviewers (security, telemetry, contract, docs, packaging), and converges into a single consensus report on the right. PR security telemetry contract docs consensus 41 issues caught
Supply-chain · 12 min read · Apr 2026

Five Claude agents audited my plugin release in parallel. They caught 41 issues.

A defender-side playbook for shipping a Claude Code plugin with PyPI-tier supply-chain hygiene — Sigstore signing, OIDC Trusted Publisher, CodeQL, property-tested telemetry, and the multi-agent review workflow that found everything I missed.

Read the write-up →
LLM cascade: Claude as primary, fall through to GPT-4o, Kimi, Ollama, then keyword on each provider failure. A request flows left-to-right through five providers. Each box is a fallback step; the active path stops at the first provider returning a usable answer. Claude is highlighted as the primary; keyword-extract on the right is the no-LLM degraded mode. Claude primary GPT-4ofallback Kimicheap Ollamalocal keywordno-LLM circuit breaker on each provider first usable answer wins; rest skip
AI infra · draft · on Substack

The LLM cascade as a routing system, not a fallback.

Five providers, circuit breakers per provider, per-category routing, and a reward loop. What worked: Claude-first with cost-aware overflow. What didn't: naive failover by HTTP status code — LLMs fail in much weirder ways than 500s.

Read on Substack →
Azure DevOps pipeline compressing deploy cycle from three months to fourteen days. A git push enters from the left, runs through build, test, and security gates, and lands on a database-target environment in about fourteen days end-to-end. The earlier three-month process is shown faintly above as a comparison anchor. before: 3 months / cross-team relay / manual gates git buildcached testparallel CDC ETLincremental DB after: 14 days · idempotent · -67% compute
Data platform · draft · on Substack

Compressing a deploy cycle from 3 months to 14 days.

CI/CD ownership at ExponentHR — Azure DevOps pipelines, idempotent CDC ETL with merge-upserts, AAG copy-down automation. Honest tradeoffs section: which gates I tried to remove and learned to keep.

Read on Substack →

External articles and peer-reviewed work.

Posts on Substack covering production decisions; peer-reviewed publication in Taylor & Francis.

Architecture

Chrome Extension ATS Adapters: Per-Platform Engineering

Why one generic form handler is the wrong abstraction when ATS platforms vary by DOM isolation, field detection logic, and submission behavior.

Read on Substack
System design

Two Booleans and a Production Bug: State Design in AutoApply AI

A seemingly simple boolean flag caused a state bug that took three false fixes before the real model emerged. What it taught about representing apply state.

Read on Substack
AI systems

The LLM Cascade as a Routing System, Not a Fallback

Five providers, circuit breakers, per-category routing, and a reward loop. Why the model routing layer is the most interesting engineering surface in AutoApply AI.

Read on Substack

An Examination of Sentiment Analysis as a Tool for Gathering Visitor Feedback

Dr. David Bojanic, Narendranath Edara, Jane Zhang

Journal of Nonprofit & Public Sector Marketing · Taylor & Francis · 2025

View publication →

What colleagues and managers wrote on LinkedIn.

LinkedIn recommendation
"Naren is a great talent in the product ownership space. He is a critical thinker and understands how to analyze a problem, and convert that problem into a tangible requirements doc. Most importantly, he is a quick and humble learner that loves researching the areas of the business that are new to him."
RE
Richard Enright Lead Product Manager, Harness
LinkedIn recommendation
"I had the pleasure of working closely with Narendranath for 3+ years and can confidently attest to his outstanding performance and exceptional work ethic. He had a keen eye for solving business problems by identifying trends to improve the overall business."
SM
Snehal Moni Operations & Policy Analyst, Oregon Health Authority
LinkedIn recommendation
"Narendranath has been an efficient source to the team at all times. He expertly trained himself to adapt to the various sectors of the business and was one of the few people who took up every challenge offered to him."
PS
Pranav Suresh Business Effectiveness, CIBC

Let's talk.

Based in Dallas TX, open to remote. I'm most useful to teams building production AI systems who want someone who can own the full stack - from retrieval design through deployment and observability.

Narendranath Edara

Response within 24 hours. Based in Dallas TX. Open to Senior AI Platform Engineer, Applied AI, and Backend AI roles.