Senior AI Engineer · AI Platform · LLM Systems

Senior AI Engineer building production-grade agentic systems.

I turn messy human workflows into reliable, observable AI software across LLM orchestration, evaluation, analytics, deck automation, ML infrastructure, search, and computer vision.

Explore systems I built Interview me Download resume

Live System Pulse

Representative production AI trace

representative

Sanitized representative trace. Customer data, private prompts, and internal implementation details omitted.

schema.normalize_survey184msdbsuccessSurvey columns mapped into typed analysis inputs.

planner.expand_questions912mspremium-reasoningsuccess14 insight tasks generated with dependency boundaries.

llm.generate_python2.8ssonnet-classsuccessPython analysis code emitted for sandbox execution.

sandbox.exec_analysis5.2ssandboxsuccessPersistent sandbox reused across report tasks.

judge.recompute_metric3.4sfast-verifierverifiedIndependent verification recomputed the denominator.

chart.score_visual_quality730mschartverifiedChart passed visual and semantic thresholds: 0.89.

pptx.render_native_slide1.1srendersuccessSlide compiled from deck IR to native PowerPoint.

router.downgrade_modelsaved 62%toolsavedLow-risk summarization moved to cheaper model tier.

Proof wall

Metrics with context, not isolated numbers.

The strongest claims connect outcomes to systems, constraints, and evidence.

48-72h → <1h

Report turnaround

Consulting-grade market research reports with insights, charts, and native PPTX output.

Resume verified

10x

ML infra cost reduction

Production ML platform simplification after taking ownership of search, discovery, and recommendations.

Resume verified

93% → 98%

CV accuracy lift

Worksheet recognition improvement for education products used by real learners.

Resume verified

30-50

Sandbox tasks per report

Persistent analytics sandbox execution with independent judge verification.

Resume verified

How I think

Production AI is a systems discipline.

The best AI systems are observable, evaluable, controllable, and useful under real constraints.

Production agents need explicit control flow.

Free-form loops are useful for prototypes; production workflows need debuggable state, retries, dependency control, and observable execution.

Evidence: Built DAG-based orchestration for parallel AI insight execution.

Resume verifiedInspect evidence

Every LLM output needs an evaluation path.

If a system cannot verify outputs, it cannot be trusted for business-critical work.

Evidence: Implemented independent judge verification with separate sandbox execution.

Resume verifiedInspect evidence

Observability is part of the product.

LLM systems are not production-ready until prompts, calls, traces, failures, latency, cost, and business correctness can be inspected.

Evidence: Unified agents with Langfuse, OpenTelemetry, structured traces, and cost visibility.

Resume verifiedInspect evidence

Intermediate representations make AI systems debuggable.

Direct generation is fragile. IRs create inspectable boundaries between reasoning, rendering, and export.

Evidence: Built memo-to-deck and HTML-to-native-PPTX pipeline around a deck IR.

Resume verifiedInspect evidence

Cost and latency are product features.

A system that works but cannot be afforded or debugged is not production-ready.

Evidence: Reduced ML infra costs by 10x and designed reusable sandbox execution patterns.

Resume verifiedInspect evidence

A good AI system knows when not to use AI.

Use deterministic code, typed schemas, rules, and verification paths where they are more reliable than generation.

Evidence: Separated generated analysis from sandbox execution, typed artifacts, and independent verification.

Resume verifiedInspect evidence

Featured case study

Agentic Market Research Platform

Raw survey data to verified insights, charts, and consulting-grade PPTX decks.

Market research reporting required analysts to process survey data, write insights, generate charts, validate findings, and assemble polished decks. The bottleneck was not text generation alone; the system needed numerical correctness, artifact quality, observability, and recovery boundaries.

LLM systemsDAG orchestrationEvalsSandbox executionDeck automation

Read the full case study

01Raw survey data

02Data ingestion and schema normalization

03Task planning

04DAG execution

05LLM Python code generation

06Persistent sandbox execution

07Independent judge verification

08Insight synthesis

09Highcharts chart generation

Decision Theater

What was rejected matters as much as what shipped.

Senior engineering signal comes from tradeoffs: control flow, verification, cost, artifact boundaries, and recovery behavior.

Decision fork

Free-form agents vs explicit DAG

The workflow needed parallel execution and reliable recovery, not just autonomous behavior.

Free-form autonomous loop

Pros

Fast to prototype
Flexible exploration

Cons

Hard to debug
Hard to parallelize
Unclear retry boundaries

Explicit DAG execution

Pros

Deterministic dependencies
Node-level observability
Parallel execution
Clear retries

Cons

More upfront structure
Requires domain modeling

Chosen: Explicit DAG orchestration. Production workflows need predictable execution and debugging more than theatrical autonomy.

Decision fork

LLM-only insights vs code-backed analysis

Survey analytics cannot rely on plausible natural language when denominators and filters matter.

Ask LLM from summaries

Pros

Lower engineering complexity
Fast response

Cons

Hallucinated metrics
Unsupported conclusions
Weak audit trail

Generate and execute Python

Pros

Evidence-backed outputs
Inspectable calculations
Better validation hooks

Cons

Sandboxing required
More latency and orchestration

Chosen: LLM-generated Python with sandbox execution. For business reporting, numerical correctness matters more than generation convenience.

See all decisions

Work

Breadth after flagship depth.

LLM systems, infrastructure, search, computer vision, and low-level AR performance.

Knit

Agentic Market Research Platform

Architected a production agentic research workflow that transformed analyst-heavy reporting into a verified AI execution pipeline.

LLM systemsDAG orchestrationEvalsSandbox execution

Read case study

Epic! for Kids

ML Infrastructure Rescue

Took ownership of production ML systems after layoffs and reduced cost, complexity, and operational risk.

ML infraKubernetesSearchRecommendations

Read case study

Tangible Play / Osmo

Computer Vision Product Systems

Led CV systems that improved worksheet recognition accuracy and supported interactive learning workflows.

Computer visionReal-time MLJava deploymentEducation products

Read case study

Whodat

High-Performance AR and Vision

Built and researched performance-sensitive vision primitives before the team transitioned to Osmo after acquisition.

C++ORBSLAM-style visionMonocular depth

Read case study

Interactive lab

Small simulations of the production AI problems I solve.

Production AI systems fail in traces, costs, retries, verification gaps, and artifact boundaries. These challenges show how I think about those failures.

Diagnostic proof

Debug This Agent

Read a representative trace, diagnose the failure mode, and compare against the production fix.

Open challenge

Diagnostic proof

Cost Anatomy

Toggle normalized AI workflow cost models and inspect which architecture choices change unit economics.

Open challenge

View all enabled challenges

Interview me

Ask the questions a senior AI screen would ask.

Every answer cites evidence, so a screen can move from claims to proof quickly.

architecture

Why should we not just use LangGraph for orchestration?

Best for: CTO or AI platform lead

cost-infra

How do you control LLM and ML infrastructure costs?

Best for: Founder or VP Engineering

Open Interview Me

Hiring fit matrix

A recruiter can pitch it. A CTO can interrogate it.

Each signal is tied to a concrete evidence path instead of a generic skill label.

Can architect LLM systems end to endAgentic market research platform from raw survey data to verified insights, charts, and PPTX decks.Resume verified

Can design production agent workflowsDAG orchestration, sandbox execution, independent judge verification, and observable node boundaries.Resume verified

Can build full-stack AI productsREST APIs, SSE streaming, Deck IR workflows, and multi-provider model routing in one product platform.Resume verified

Understands observabilityOpenTelemetry, Langfuse, structured traces, cost visibility, and inspectable execution boundaries.Resume verified

Understands evals and reliabilityJudge verification and multi-threshold chart quality scoring around generated analysis artifacts.Resume verified

Can own infra and reduce costs10x ML platform cost reduction, 100x Kubernetes pod reduction, and 99% spot error reduction.Resume verified

Has shipped ML beyond LLM demosComputer vision systems for education products, including 93% to 98% worksheet accuracy improvement.Resume verified

Can operate as senior ICOwned platform-level architecture decisions across orchestration, observability, APIs, and artifact generation.Resume verified

Has low-level performance depthC++ ORB detector 20% faster than ORB-SLAM baseline.Resume verified

Contact

Need someone to own production AI architecture?

I am best matched with senior AI/platform, LLM systems, and founding AI engineer conversations where reliability, evals, observability, and cost matter.

Contact Himadri Email directly Download resume