Senior AI Engineer · AI Platform · LLM Systems

Senior AI Engineer building production-grade agentic systems.

I turn messy human workflows into reliable, observable AI software across LLM orchestration, evaluation, analytics, deck automation, ML infrastructure, search, and computer vision.

Live System Pulse

Representative production AI trace

representative

Sanitized representative trace. Customer data, private prompts, and internal implementation details omitted.

schema.normalize_survey184msdbsuccessSurvey columns mapped into typed analysis inputs.
planner.expand_questions912mspremium-reasoningsuccess14 insight tasks generated with dependency boundaries.
llm.generate_python2.8ssonnet-classsuccessPython analysis code emitted for sandbox execution.
sandbox.exec_analysis5.2ssandboxsuccessPersistent sandbox reused across report tasks.
judge.recompute_metric3.4sfast-verifierverifiedIndependent verification recomputed the denominator.
chart.score_visual_quality730mschartverifiedChart passed visual and semantic thresholds: 0.89.
pptx.render_native_slide1.1srendersuccessSlide compiled from deck IR to native PowerPoint.
router.downgrade_modelsaved 62%toolsavedLow-risk summarization moved to cheaper model tier.

How I think

Production AI is a systems discipline.

The best AI systems are observable, evaluable, controllable, and useful under real constraints.

Production agents need explicit control flow.

Free-form loops are useful for prototypes; production workflows need debuggable state, retries, dependency control, and observable execution.

Evidence: Built DAG-based orchestration for parallel AI insight execution.

Every LLM output needs an evaluation path.

If a system cannot verify outputs, it cannot be trusted for business-critical work.

Evidence: Implemented independent judge verification with separate sandbox execution.

Observability is part of the product.

LLM systems are not production-ready until prompts, calls, traces, failures, latency, cost, and business correctness can be inspected.

Evidence: Unified agents with Langfuse, OpenTelemetry, structured traces, and cost visibility.

Intermediate representations make AI systems debuggable.

Direct generation is fragile. IRs create inspectable boundaries between reasoning, rendering, and export.

Evidence: Built memo-to-deck and HTML-to-native-PPTX pipeline around a deck IR.

Cost and latency are product features.

A system that works but cannot be afforded or debugged is not production-ready.

Evidence: Reduced ML infra costs by 10x and designed reusable sandbox execution patterns.

A good AI system knows when not to use AI.

Use deterministic code, typed schemas, rules, and verification paths where they are more reliable than generation.

Evidence: Separated generated analysis from sandbox execution, typed artifacts, and independent verification.

Decision Theater

What was rejected matters as much as what shipped.

Senior engineering signal comes from tradeoffs: control flow, verification, cost, artifact boundaries, and recovery behavior.

Decision fork

Free-form agents vs explicit DAG

The workflow needed parallel execution and reliable recovery, not just autonomous behavior.

Free-form autonomous loop

Pros
  • Fast to prototype
  • Flexible exploration
Cons
  • Hard to debug
  • Hard to parallelize
  • Unclear retry boundaries

Explicit DAG execution

Pros
  • Deterministic dependencies
  • Node-level observability
  • Parallel execution
  • Clear retries
Cons
  • More upfront structure
  • Requires domain modeling

Chosen: Explicit DAG orchestration. Production workflows need predictable execution and debugging more than theatrical autonomy.

Decision fork

LLM-only insights vs code-backed analysis

Survey analytics cannot rely on plausible natural language when denominators and filters matter.

Ask LLM from summaries

Pros
  • Lower engineering complexity
  • Fast response
Cons
  • Hallucinated metrics
  • Unsupported conclusions
  • Weak audit trail

Generate and execute Python

Pros
  • Evidence-backed outputs
  • Inspectable calculations
  • Better validation hooks
Cons
  • Sandboxing required
  • More latency and orchestration

Chosen: LLM-generated Python with sandbox execution. For business reporting, numerical correctness matters more than generation convenience.

See all decisions

Work

Breadth after flagship depth.

LLM systems, infrastructure, search, computer vision, and low-level AR performance.

Epic! for Kids

ML Infrastructure Rescue

Took ownership of production ML systems after layoffs and reduced cost, complexity, and operational risk.

ML infraKubernetesSearchRecommendations
Read case study

Tangible Play / Osmo

Computer Vision Product Systems

Led CV systems that improved worksheet recognition accuracy and supported interactive learning workflows.

Computer visionReal-time MLJava deploymentEducation products
Read case study

Whodat

High-Performance AR and Vision

Built and researched performance-sensitive vision primitives before the team transitioned to Osmo after acquisition.

C++ORBSLAM-style visionMonocular depth
Read case study

Interactive lab

Small simulations of the production AI problems I solve.

Production AI systems fail in traces, costs, retries, verification gaps, and artifact boundaries. These challenges show how I think about those failures.

Interview me

Ask the questions a senior AI screen would ask.

Every answer cites evidence, so a screen can move from claims to proof quickly.

architecture

Why should we not just use LangGraph for orchestration?

Best for: CTO or AI platform lead

cost-infra

How do you control LLM and ML infrastructure costs?

Best for: Founder or VP Engineering

Hiring fit matrix

A recruiter can pitch it. A CTO can interrogate it.

Each signal is tied to a concrete evidence path instead of a generic skill label.

Can architect LLM systems end to endAgentic market research platform from raw survey data to verified insights, charts, and PPTX decks.Resume verified
Can design production agent workflowsDAG orchestration, sandbox execution, independent judge verification, and observable node boundaries.Resume verified
Can build full-stack AI productsREST APIs, SSE streaming, Deck IR workflows, and multi-provider model routing in one product platform.Resume verified
Understands observabilityOpenTelemetry, Langfuse, structured traces, cost visibility, and inspectable execution boundaries.Resume verified
Understands evals and reliabilityJudge verification and multi-threshold chart quality scoring around generated analysis artifacts.Resume verified
Can own infra and reduce costs10x ML platform cost reduction, 100x Kubernetes pod reduction, and 99% spot error reduction.Resume verified
Has shipped ML beyond LLM demosComputer vision systems for education products, including 93% to 98% worksheet accuracy improvement.Resume verified
Can operate as senior ICOwned platform-level architecture decisions across orchestration, observability, APIs, and artifact generation.Resume verified
Has low-level performance depthC++ ORB detector 20% faster than ORB-SLAM baseline.Resume verified

Contact

Need someone to own production AI architecture?

I am best matched with senior AI/platform, LLM systems, and founding AI engineer conversations where reliability, evals, observability, and cost matter.