Available June 2026 · Full-time

Machine learning engineer, working on
production AI that
actually ships.

§ Précis

I work the hard parts of LLM products — retrieval, evals, observability, cost routing — and ship them to production with measurable wins. Currently at AriesView; M.S. Machine Learning at Stevens, class of '26.

Resume ↓
Currently AriesView · Boston, MA
Studying M.S. ML, Stevens '26
Based New York, NY
Fig. 011 : 1
Portrait of Rishi Chhabraa

Most teams know how to call an API. The hard parts are retrieval that doesn't lie, evals that don't drift, and bills that don't grow exponentially.

That's where I work. Production RAG with hybrid + graph retrieval, agentic systems wired up via MCP, semantic routers that cut cloud spend by orders of magnitude, and CI/CD gates that block hallucinations before they ship.

Selected work.

06 of 06 shown

Enterprise MCP Server for Agentic LLMs

Open-source Model Context Protocol server exposing GitHub and internal CRUD APIs as tools to LLM agents. GitHub OAuth2 auth, token-bucket rate limiting via Redis (500 req/min). One-command Docker Compose deployment. 558 req/s throughput, 3.7ms health check latency, p50 API latency ~5ms.

Year2026
RoleAuthor
StackPython · FastAPI · MCP · Claude API · OAuth2 · Redis · Docker
github.com/rchhabra13/mcp-agentic-llm ↗

Cost-Aware Semantic Gateway for LLM Routing

Heuristic prompt complexity classifier (0.06ms avg, 0.17ms p99 — no model inference) routing 79% of queries to local Llama 3.2 (vLLM) and complex tasks to GPT-4o. 93.4% routing accuracy across 81-query benchmark. Streamlit FinOps dashboard tracks per-query cost and routing decisions. 68.1% reduction in cloud inference spend ($0.4835 → $0.1542).

Year2026
RoleAuthor
StackFastAPI · vLLM · Llama 3.2 · GPT-4o · SQLite · Streamlit
github.com/rchhabra13/cost-aware-semantic-routing ↗

Automated LLM Evaluation & Observability Pipeline

CI/CD eval gate that runs a RAG chain through three DeepEval metrics (Faithfulness, Answer Relevancy, Contextual Precision) on every pull request using Claude Haiku as judge — blocks merges when any metric falls below 95%. LangSmith traces every retriever call and generation with inputs, outputs, latency, and token counts. Built to catch silent hallucination regressions from prompt or model changes before they ship.

Year2026
RoleAuthor
StackDeepEval · LangSmith · GitHub Actions · OpenAI API
github.com/rchhabra13/llm-eval-pipeline ↗

LLM-Powered Hedge Fund Analyst Agent

Autonomous multi-agent equity analyst ingesting SEC 10-K filings, real-time prices, and earnings transcripts. LangGraph StateGraph orchestrates five sub-agents: filing parser, DCF valuation, FinBERT sentiment, moat assessment, and memo synthesis. Neo4j knowledge graph for cross-company comparative queries. Full pipeline traced end-to-end in LangSmith. Generates cited Buy/Hold/Sell memos with confidence scores in under 30 seconds.

Year2026
RoleAuthor
StackLangGraph · Claude API · SEC EDGAR · yfinance · FinBERT · Neo4j · LangSmith · FastAPI · Docker
github.com/rchhabra13/hedge-fund-analyst ↗

Clinical Trial Matchmaker — FHIR + MCP

Hackathon project. MCP server that reads a patient's FHIR R4 bundle, queries ClinicalTrials.gov for recruiting trials, and uses Claude to reason plain-English inclusion/exclusion criteria against the actual clinical profile — returning a ranked, scored, explained list directly inside the clinician's workflow. Targets the discovery gap behind the 3–5% trial enrollment rate despite ~50% medical eligibility. SHARP context propagation auto-injects patient context from the EHR. Patient-facing summaries in 7 languages.

Year2026
EventAgents Assemble: Healthcare AI Endgame · Devpost
StackFastMCP · Claude API · FHIR R4 · ClinicalTrials.gov API · Pydantic · Docker
github.com/rchhabra13/fhir-trial-agent ↗

Hospital Readmission Prediction

End-to-end ML pipeline predicting 30-day diabetic patient readmissions across 101,766 encounters (UCI). Four-phase methodology: preprocessing + PCA, K-Means and hierarchical clustering to surface patient risk groups, SMOTE-balanced classification, and full evaluation. Six classifiers benchmarked — Logistic Regression recommended: F1 0.2488, ROC-AUC 0.629, Recall 52.8%. Random Forest hit 88.8% accuracy but caught only 3.2% of true readmissions, illustrating why recall matters most in clinical settings. Stevens MIS 637.

Year2026
RoleClassification & Evaluation
StackPython · scikit-learn · imbalanced-learn · PCA · K-Means · pandas · Jupyter
github.com/rchhabra13/hospital-readmission ↗

Experience.

09 / 2025
Present
AriesView
Real Estate AI SaaS · Boston, MA
AI/ML Engineer Intern
Architected production RAG: Weaviate Hybrid Search + Neo4j graph retrieval. +40% retrieval accuracy, −30% hallucination rate on internal eval set.
Designed semantic chunking + Cohere re-ranking; CI/CD on Kubernetes; −30% p95 latency via Redis caching.
02 / 2024
08 / 2024
Incuwise
Startup Incubator · Delhi, India
Software Development Engineer I
Re-architected monolithic backend → serverless AWS (Lambda, API Gateway) for the incubator's founder management platform. +60% scalability, 99.9% uptime, 10k+ users.
Shipped 4 production apps for portfolio companies. −15% API p95 response time, +25% user engagement.

Stack.

Languages
Python SQL JavaScript
ML & GenAI
PyTorch LangChain LangGraph RAG / GraphRAG vLLM Hugging Face DeepEval
LLMs & Agents
Claude API GPT-4o Llama 3.2 MCP
Cloud & MLOps
AWS SageMaker Lambda Docker Kubernetes Databricks MLflow GitHub Actions
Data & Vector DBs
Weaviate Neo4j MongoDB PySpark Snowflake BigQuery

§ Education

Stevens Institute of Technology
M.S. Machine Learning · 2024 – 2026
Hoboken, NJ
Central University of Haryana
B.Tech. Computer Science · 2024

§ Certifications

AWS Certified ML Engineer2025
Databricks Gen AI Engineer2026
Databricks Spark Developer2025
AWS Certified Developer2025