Building production RAG pipelines (Weaviate HNSW·BM25, RRF k=60, cross-encoder rerank) · shipped on-device Gemma 4 (LiteRT INT4, 87ms p50 TTFT, 23 tok/s decode) · trained nanoGPT from scratch (10.7M params, val_loss=2.85, OpenWebText 8B tokens) · orchestrated multi-agent CrewAI with SEC EDGAR + FAISS retrieval
Temp: 74°C | Power: 312W
Memory: 19103MiB / 81920MiB
PROD RAG_PIPELINE @ ARIESVIEW
cosine θ=0.85
batch_size=512
ef_construction=200
k1=1.2 b=0.75
top-k=40 each
top-20 merged
top-5 rerank
TTL=3600s 67% hit
RUNNING PROCESSES
Ported Gemma 4 (2B params) to mobile edge inference via Google LiteRT runtime. Applied INT4 NF4 weight-only quantization collapsing model footprint 4.8GB → 1.2GB (75%). Implemented sliding-window attention (window=512) dropping memory complexity O(n²) → O(n). Custom KV cache (2048-token rolling buffer) under LPDDR5 6GB constraint. Profiled on Snapdragon 8 Gen 3: 87ms TTFT p50, 210ms p99, 23 tok/s decode. 4× faster cold-start via optimized tokenizer serialization (.pb → FlatBuffer).
Implemented decoder-only transformer from scratch (Karpathy nanoGPT arch): 6L · 6H · 384-dim · 10.7M parameters. Trained on OpenWebText (8B tokens, ~800MB). AdamW: lr=3×10⁻⁴ (cosine decay), β₁=0.9, β₂=0.95, ε=10⁻⁸, wd=0.1. Grad clip=1.0, dropout=0.2. Effective batch=320 via gradient accumulation (5 steps × 64 batch). Achieved val_loss=2.85, perplexity=17.3 at 100K iters. ~18 GPU-hours on 4× RTX 3090 (96GB VRAM total).
4-agent CrewAI system: Researcher, Quant Analyst, Risk Assessor, Report Writer via role-based orchestration. Tool stack: SEC EDGAR full-text search, Yahoo Finance yfinance v0.2, Tavily web search, FAISS over 10K earnings transcripts (all-MiniLM-L6-v2, 384-dim). Pydantic v2 output contracts → JSON → PDF. Parallelized Researcher + Quant agents: e2e latency 90s → 31s p50 (−66%).
Automated full ML lifecycle: DVC data versioning → SageMaker managed spot instances ($0.30/hr vs $2.10/hr on-demand, −86% cost) → MLflow experiment tracking (847 runs, 12K artifacts) → Kubernetes rolling deploys (HPA autoscale, readiness probes, canary 10%→100%). Terraform: 3 VPCs, 2 EKS clusters, ECR. Time-to-production 2wk → 4hr, manual intervention −80%.
PROCESS MONITOR // SKILL_UTILIZATION
COMMIT HISTORY // WORK_EXP
EDUCATION + CERTIFICATIONS
// STDOUT → medium.com/@rrchhabra
Writing about production ML systems, RAG architectures, on-device inference, and LLM evaluation pipelines. No tutorials — only battle-tested engineering from real deployments.
READ_ARTICLES →