Report turnaround
Consulting-grade market research reports with insights, charts, and native PPTX output.
Resume verifiedSenior AI Engineer · AI Platform · LLM Systems
I turn messy human workflows into reliable, observable AI software across LLM orchestration, evaluation, analytics, deck automation, ML infrastructure, search, and computer vision.
Live System Pulse
Sanitized representative trace. Customer data, private prompts, and internal implementation details omitted.
schema.normalize_survey184msdbsuccessSurvey columns mapped into typed analysis inputs.planner.expand_questions912mspremium-reasoningsuccess14 insight tasks generated with dependency boundaries.llm.generate_python2.8ssonnet-classsuccessPython analysis code emitted for sandbox execution.sandbox.exec_analysis5.2ssandboxsuccessPersistent sandbox reused across report tasks.judge.recompute_metric3.4sfast-verifierverifiedIndependent verification recomputed the denominator.chart.score_visual_quality730mschartverifiedChart passed visual and semantic thresholds: 0.89.pptx.render_native_slide1.1srendersuccessSlide compiled from deck IR to native PowerPoint.router.downgrade_modelsaved 62%toolsavedLow-risk summarization moved to cheaper model tier.Proof wall
The strongest claims connect outcomes to systems, constraints, and evidence.
Consulting-grade market research reports with insights, charts, and native PPTX output.
Resume verifiedProduction ML platform simplification after taking ownership of search, discovery, and recommendations.
Resume verifiedWorksheet recognition improvement for education products used by real learners.
Resume verifiedPersistent analytics sandbox execution with independent judge verification.
Resume verifiedHow I think
The best AI systems are observable, evaluable, controllable, and useful under real constraints.
Free-form loops are useful for prototypes; production workflows need debuggable state, retries, dependency control, and observable execution.
Evidence: Built DAG-based orchestration for parallel AI insight execution.
If a system cannot verify outputs, it cannot be trusted for business-critical work.
Evidence: Implemented independent judge verification with separate sandbox execution.
LLM systems are not production-ready until prompts, calls, traces, failures, latency, cost, and business correctness can be inspected.
Evidence: Unified agents with Langfuse, OpenTelemetry, structured traces, and cost visibility.
Direct generation is fragile. IRs create inspectable boundaries between reasoning, rendering, and export.
Evidence: Built memo-to-deck and HTML-to-native-PPTX pipeline around a deck IR.
A system that works but cannot be afforded or debugged is not production-ready.
Evidence: Reduced ML infra costs by 10x and designed reusable sandbox execution patterns.
Use deterministic code, typed schemas, rules, and verification paths where they are more reliable than generation.
Evidence: Separated generated analysis from sandbox execution, typed artifacts, and independent verification.
Featured case study
Raw survey data to verified insights, charts, and consulting-grade PPTX decks.
Market research reporting required analysts to process survey data, write insights, generate charts, validate findings, and assemble polished decks. The bottleneck was not text generation alone; the system needed numerical correctness, artifact quality, observability, and recovery boundaries.
Decision Theater
Senior engineering signal comes from tradeoffs: control flow, verification, cost, artifact boundaries, and recovery behavior.
Decision fork
The workflow needed parallel execution and reliable recovery, not just autonomous behavior.
Chosen: Explicit DAG orchestration. Production workflows need predictable execution and debugging more than theatrical autonomy.
Decision fork
Survey analytics cannot rely on plausible natural language when denominators and filters matter.
Chosen: LLM-generated Python with sandbox execution. For business reporting, numerical correctness matters more than generation convenience.
Work
LLM systems, infrastructure, search, computer vision, and low-level AR performance.
Knit
Architected a production agentic research workflow that transformed analyst-heavy reporting into a verified AI execution pipeline.
Epic! for Kids
Took ownership of production ML systems after layoffs and reduced cost, complexity, and operational risk.
Tangible Play / Osmo
Led CV systems that improved worksheet recognition accuracy and supported interactive learning workflows.
Whodat
Built and researched performance-sensitive vision primitives before the team transitioned to Osmo after acquisition.
Interactive lab
Production AI systems fail in traces, costs, retries, verification gaps, and artifact boundaries. These challenges show how I think about those failures.
Interview me
Every answer cites evidence, so a screen can move from claims to proof quickly.
architecture
Best for: CTO or AI platform lead
cost-infra
Best for: Founder or VP Engineering
Hiring fit matrix
Each signal is tied to a concrete evidence path instead of a generic skill label.
Contact
I am best matched with senior AI/platform, LLM systems, and founding AI engineer conversations where reliability, evals, observability, and cost matter.