Trace-to-Fix: how are you actually improving RAG/agents after observability flags issues?
09:58 06 Mar 2026

I’ve been looking at the agent/LLM observability space lately (Langfuse, LangSmith, Arize, Braintrust, Datadog LLM Observability, etc.). Traces are great at showing what failed and where it failed.

What I’m still curious about is the step after that:

How do you go from “I see the failure in the trace” to “I found the fix” in a repeatable way?

Examples of trace-level issues I mean:

  • Retrieval returns low-quality context or misses key docs

  • Citation enforcement fails or the model does not cite what it uses

  • Tool calls have bad parameters or the agent picks the wrong tool

  • Reranking or chunking choices look off in hindsight

Do you:

  • Write custom scripts to sweep params (chunk size, top-k, rerankers, prompts, tool policies)?

  • Add failing traces to a dataset and run experiments?

  • A/B prompts in production?

  • Maintain a regression suite of traces?

  • Something else?

Would love to hear the practical workflow people are actually using.

agent observability