Code, Context, and Me
A self-intro. I work on the controls around AI agents - what they can do, what stays human, and how the system fails when something slips. Here’s how I think, shown through the work, including its limits.
A self-intro. I work on the controls around AI agents - what they can do, what stays human, and how the system fails when something slips. Here’s how I think, shown through the work, including its limits.
A rule stated in the markdown is typically followed, but when it’s not, that rule belongs in code that runs before the tool itself.
A guardrail that blocked the workflow it was meant to protect, and a permission rule that was silently dead. Both were bugs in my own config.
Context degradation, silent failure, and miscalibrated escalation look like three separate problems. They’re all the same instrumentation gap.
Every prompt engineering technique exists because a specific failure mode forced it. Here’s the failure taxonomy, not the technique list.
CLAUDE.md, path rules, skills, hooks, and headless CI each have their own post. Nobody writes about how they compose or what happens when the stack drifts.
Everyone says ‘fix your tool descriptions.’ Nobody shows how to diagnose which specific failure caused the misroute.
stop_reason is six lines of code. The real engineering in agentic systems is protecting what goes into the model’s context window.
A 0.91 faithfulness score doesn’t mean your RAG pipeline works. Most eval panels can’t see the layer that’s actually broken.
A decision framework for picking LLM architecture by asking what failure costs first and why agents are the right answer less often than you think.