Knit · May 2025 – April 2026
Agentic Market Research Platform
Raw survey data to verified insights, charts, and consulting-grade PPTX decks.
Role: Senior AI Engineer / principal architect for India AI team
Executive summary
Architected a production agentic research workflow that transformed analyst-heavy reporting into a verified AI execution pipeline.
Problem and constraints
Market research reporting required analysts to process survey data, write insights, generate charts, validate findings, and assemble polished decks. The bottleneck was not text generation alone; the system needed numerical correctness, artifact quality, observability, and recovery boundaries.
- Insights needed numerical correctness, not fluent guesses.
- Charts needed to be visually usable and connected to evidence.
- Reports needed consulting-grade native PowerPoint output.
- Workflow execution needed parallelism, retries, and traceability.
- Private prompts, customer data, internal traces, and proprietary implementation details must remain omitted from public discussion.
Architecture
Improved system diagrams
Research workflow boundaries
A sanitized view of how raw survey data became verified insights, chart specs, and deck artifacts through explicit boundaries.
Ingest and normalize schema.
Break report into typed analysis tasks.
Generate and run auditable Python.
Recompute and inspect high-risk outputs.
Render charts and slides from inspectable structure.
Sanitized architecture diagram. Customer data, private prompts, internal datasets, and proprietary implementation details omitted.
Observability and cost loop
Trace spans, model routes, retry budgets, and normalized cost counters make production AI failures debuggable.
Latency, model, status, and task class.
Select model by risk and value.
Detect node-level cost anomalies.
Recover only where useful.
Improve routing and eval policy.
Representative systems diagram. Exact company costs and internal traces omitted.
Decision Theater
Decision fork
Free-form agents vs explicit DAG
The workflow needed parallel execution and reliable recovery, not just autonomous behavior.
Chosen: Explicit DAG orchestration. Production workflows need predictable execution and debugging more than theatrical autonomy.
Decision fork
LLM-only insights vs code-backed analysis
Survey analytics cannot rely on plausible natural language when denominators and filters matter.
Chosen: LLM-generated Python with sandbox execution. For business reporting, numerical correctness matters more than generation convenience.
Decision fork
Self-check vs independent judge
A system that verifies itself can still agree with its own mistakes.
Chosen: Independent judge with separate sandbox execution. Verification is the difference between a demo and a production AI system.
Evaluation and reliability
- Independent judge verification recomputed results in a separate sandbox path.
- Chart outputs passed multi-threshold quality scoring before deck assembly.
- Retry semantics were tied to task boundaries rather than vague agent state.
Observability and debugging
- OpenTelemetry and Langfuse made model calls, spans, failures, and cost inspectable.
- Task-level traces exposed latency, retries, and model routing behavior.
- Generated APIs and SSE streaming made execution state visible to product surfaces.
Reflection
The durable lesson is that production AI systems are less about an agent loop and more about explicit boundaries: typed inputs, executable artifacts, independent verification, observability, and unit economics.
This case study uses sanitized architecture and representative examples. It excludes confidential prompts, customer data, proprietary datasets, private implementation details, and internal traces.