// RISHI CHHABRA :: AI/ML ENGINEER :: KERNEL INIT
LOADING MODULES0%
RISHI
CHHABRA
//

Building production RAG pipelines (Weaviate HNSW·BM25, RRF k=60, cross-encoder rerank) · shipped on-device Gemma 4 (LiteRT INT4, 87ms p50 TTFT, 23 tok/s decode) · trained nanoGPT from scratch (10.7M params, val_loss=2.85, OpenWebText 8B tokens) · orchestrated multi-agent CrewAI with SEC EDGAR + FAISS retrieval

PyTorch 2.3 LangChain / LangGraph Weaviate HNSW AWS SageMaker LiteRT / On-Device Kubernetes MLflow Redis / Valkey
neural-os :: system_monitor
PROCESSportfolio.exe
PID31337
STATUSOPEN_TO_HIRE
LOCATIONNew York, NY
UPTIME00:00:00
CPU UTIL78%
GPU UTIL (A100)92%
VRAM USAGE19.1 / 24 GB
TOKENS/SEC340
INFER QUEUE3 jobs
MODELS LOADED4 active
TOTAL TOKENS2.4T
COMMITS1,337
$ nvidia-smi --query
GPU 0: A100-SXM4-80GB
Temp: 74°C | Power: 312W
Memory: 19103MiB / 81920MiB
rishi@neural-os:~$ cat rag_architecture.yaml

PROD RAG_PIPELINE @ ARIESVIEW

// INGESTION PIPELINE — OFFLINE
INGESTION
Raw Docs
PDF · DOCX · HTML
Parser + OCR
PyMuPDF · Tesseract
Semantic Chunker
mpnet-base-v2
cosine θ=0.85
Embedder
ada-002 1536-dim
batch_size=512
Weaviate HNSW
M=16 ef=128
ef_construction=200
// QUERY PIPELINE — ONLINE (p50 <2s e2e)
RETRIEVAL
User Query
natural language
Query Encoder
ada-002 + BM25
k1=1.2 b=0.75
Hybrid Search
Dense ANN + Sparse
top-k=40 each
RRF Fusion
k=60 constant
top-20 merged
Cross-Encoder
MiniLM-L6-v2
top-5 rerank
LLM + Cache
GPT-4o · Redis
TTL=3600s 67% hit
RAGAS: faithfulness 0.71→0.89 · answer_relevancy 0.68→0.84 · hallucination 22%→8% (500-sample internal benchmark) · CI/CD deploy 45min→8min via Docker layer caching
rishi@neural-os:~$ ps aux | grep projects

RUNNING PROCESSES

on_device_genai.py — python3.11/litertPID 1001
$ ./run --project "On-Device GenAI" --verbose
On-Device GenAI App (Gemma 4 + LiteRT)
→ pytorch / litert / gemma-4-2b / snapdragon-8-gen3

Ported Gemma 4 (2B params) to mobile edge inference via Google LiteRT runtime. Applied INT4 NF4 weight-only quantization collapsing model footprint 4.8GB → 1.2GB (75%). Implemented sliding-window attention (window=512) dropping memory complexity O(n²) → O(n). Custom KV cache (2048-token rolling buffer) under LPDDR5 6GB constraint. Profiled on Snapdragon 8 Gen 3: 87ms TTFT p50, 210ms p99, 23 tok/s decode. 4× faster cold-start via optimized tokenizer serialization (.pb → FlatBuffer).

87msTTFT p50
1.2GBModel Size
23/sTok/Sec
ON-DEVICELITERTINT4-QUANTGEMMA-4
autoresearch_train.py — cuda:0,1,2,3PID 1002
$ ./run --project "Autoresearch nanoGPT" --verbose
Autoresearch — nanoGPT from Scratch
→ pytorch / cuda-12.4 / openwebtext / 4×rtx-3090

Implemented decoder-only transformer from scratch (Karpathy nanoGPT arch): 6L · 6H · 384-dim · 10.7M parameters. Trained on OpenWebText (8B tokens, ~800MB). AdamW: lr=3×10⁻⁴ (cosine decay), β₁=0.9, β₂=0.95, ε=10⁻⁸, wd=0.1. Grad clip=1.0, dropout=0.2. Effective batch=320 via gradient accumulation (5 steps × 64 batch). Achieved val_loss=2.85, perplexity=17.3 at 100K iters. ~18 GPU-hours on 4× RTX 3090 (96GB VRAM total).

2.85Val Loss
17.3Perplexity
10.7MParams
TRANSFORMERFROM-SCRATCHCUDANLP
financial_agents.py — crewai/gpt-4oPID 1003
$ ./run --project "GenAI Financial Advisor" --verbose
GenAI Financial Advisor (CrewAI)
→ crewai / gpt-4o / faiss / sec-edgar / yfinance

4-agent CrewAI system: Researcher, Quant Analyst, Risk Assessor, Report Writer via role-based orchestration. Tool stack: SEC EDGAR full-text search, Yahoo Finance yfinance v0.2, Tavily web search, FAISS over 10K earnings transcripts (all-MiniLM-L6-v2, 384-dim). Pydantic v2 output contracts → JSON → PDF. Parallelized Researcher + Quant agents: e2e latency 90s → 31s p50 (−66%).

31sE2E p50
4Agents
10KDoc Index
AGENTICCREWAIGPT-4OFAISS
mlops_pipeline.sh — kubectl/sagemakerPID 1004
$ ./run --project "MLOps Pipeline" --verbose
End-to-End MLOps Pipeline
→ aws-sagemaker / kubernetes / mlflow / docker / terraform

Automated full ML lifecycle: DVC data versioning → SageMaker managed spot instances ($0.30/hr vs $2.10/hr on-demand, −86% cost) → MLflow experiment tracking (847 runs, 12K artifacts) → Kubernetes rolling deploys (HPA autoscale, readiness probes, canary 10%→100%). Terraform: 3 VPCs, 2 EKS clusters, ECR. Time-to-production 2wk → 4hr, manual intervention −80%.

-80%Manual Work
4hrDeploy Time
847MLflow Runs
MLOPSKUBERNETESSAGEMAKERTERRAFORM
rishi@neural-os:~$ htop --sort=CPU --filter=ml_stack

PROCESS MONITOR // SKILL_UTILIZATION

NEURAL_OS PROCESS MANAGER v2.0Tasks: 15 running, 0 sleeping
PIDPROCESSUSAGE%CPUDOMAIN
1001python==3.11
96%LANG
1002pytorch==2.3.1+cu124
94%DEEP_LEARNING
1003transformers==4.44.0
88%NLP
1004langchain / langgraph
89%ORCHESTRATION
1005litert-runtime (on-device)
87%ON_DEVICE
1006weaviate-client==4.6.5
85%VECTOR_DB
1007pytorch-lightning==2.3
83%TRAINING
1008docker + kubernetes
82%INFRA
1009aws-sagemaker / lambda
78%CLOUD
1010mlflow==2.15 / wandb
76%EXPERIMENT
1011redis / valkey cache
75%CACHE
1012sql / postgres / duckdb
80%DATA
1013fastapi / grpc / nginx
77%SERVING
1014terraform / iac
68%IaC
1015c++ / cuda kernels
62%SYSTEMS
rishi@neural-os:~$ git log --all --oneline --graph

COMMIT HISTORY // WORK_EXP

$ git log --format="%H %s" --author="Rishi Chhabra" -- production/
a3f92b1eHEAD → mainorigin/ariesview
Author: Rishi Chhabra <rishi.chhabra@outlook.com>
Date: Sep 2025 – Present · Jersey City, NJ
AriesView
AI Engineer Intern · RAG & AI Infrastructure
feat(rag): architect Weaviate HNSW hybrid search — BM25 + dense ANN, RRF fusion k=60, top-5 cross-encoder rerank
perf(chunking): semantic chunking via mpnet-base-v2 reduces hallucination 22%→8%, RAGAS faithfulness +25%
feat(embed): migrate to text-embedding-ada-002 1536-dim, retrieval recall@5 +15%
perf(ci): Docker layer cache + parallel test stages cut pipeline 45min→8min (−82%)
feat(cache): Redis/Valkey write-through TTL=3600s, 67% hit rate, avg latency −340ms
fix(rerank): cross-encoder scoring normalization, eliminate rank inversion on long-tail queries
TECH: Weaviate · LangChain · Redis/Valkey · Docker · CI/CD · Python 3.11 · FastAPI
d4e81c27origin/incuwise
Author: Rishi Chhabra <rishi.chhabra@outlook.com>
Date: Feb 2024 – Aug 2024 · New Delhi, India
Incuwise
Software Development Engineer I · Full-Stack & Cloud
feat(infra): migrate monolith → serverless AWS Lambda — cost −60%, uptime 99.9%, cold-start <800ms
perf(api): Node.js async handlers + connection pooling reduce p95 latency 180ms→153ms (−15%) at 10K DAU
feat(deploy): ship 4 production apps — Flutter mobile + Node.js backends, engagement +25%
ref(arch): SQS event-driven message queue, decouple services, scalability +60% under load
TECH: AWS Lambda · SQS · Node.js · Flutter · DynamoDB · Serverless Framework
rishi@neural-os:~$ cat /etc/education.conf

EDUCATION + CERTIFICATIONS

🎓
Stevens Institute of Technology
M.S. in Machine Learning
Hoboken, NJ · 2024 – 2026
GPA: 3.5 / 4.0
Coursework: Deep Learning · NLP · Computer Vision · Probabilistic ML · Optimization
🏛️
Central University of Haryana
B.Tech. in Computer Science
India · 2020 – 2024
Coursework: Data Structures · Algorithms · OS · DBMS · Computer Networks
Databricks Certified Generative AI Engineer Associate · 2026
AWS Certified Machine Learning Engineer – Associate
Databricks Certified Developer for Apache Spark – Associate
AWS Certified Developer – Associate

// STDOUT → medium.com/@rrchhabra

Writing about production ML systems, RAG architectures, on-device inference, and LLM evaluation pipelines. No tutorials — only battle-tested engineering from real deployments.

READ_ARTICLES →
rishi@neural-os:~$ ./send_message --to rishi

ESTABLISH CONNECTION

rishi@neural-os:~$ ./compose_message
// NEURAL_OS :: ADMIN_PANEL
VISITOR_LOG
TIMEIPLOCATIONDEVICEBROWSERREFERRER
EVENT_LOG
TIMETYPELABEL