Posts

Code, Context, and Me

A self-intro. I work on the controls around AI agents - what they can do, what stays human, and how the system fails when something slips. Here’s how I think, shown through the work, including its limits.

Using Hooks as Deterministic Guardrail

A rule stated in the markdown is typically followed, but when it’s not, that rule belongs in code that runs before the tool itself.

Debugging My Agent Guardrails

A guardrail that blocked the workflow it was meant to protect, and a permission rule that was silently dead. Both were bugs in my own config.

Auditing Agent Self-Truth

Context degradation, silent failure, and miscalibrated escalation look like three separate problems. They’re all the same instrumentation gap.

Fixing Common Failure Modes in LLM Extraction

Every prompt engineering technique exists because a specific failure mode forced it. Here’s the failure taxonomy, not the technique list.

Exploring Claude Code Configuration Stack

CLAUDE.md, path rules, skills, hooks, and headless CI each have their own post. Nobody writes about how they compose or what happens when the stack drifts.

Debugging Tool Misrouting in LLM Agents

Everyone says ‘fix your tool descriptions.’ Nobody shows how to diagnose which specific failure caused the misroute.

Protecting The Model's Context Window

stop_reason is six lines of code. The real engineering in agentic systems is protecting what goes into the model’s context window.

Pitfalls in RAG Evaluation

A 0.91 faithfulness score doesn’t mean your RAG pipeline works. Most eval panels can’t see the layer that’s actually broken.

Picking the Right LLM Architecture

A decision framework for picking LLM architecture by asking what failure costs first and why agents are the right answer less often than you think.