Articles & Journal

Writing

I think in public. Long-form articles work through the engineering ideas in depth; the journal is a dated log of what I'm building and learning along the way: retrieval, multi-agent systems, evaluation, and applied machine learning.

Articles

June 28, 20267 min read

Seeing Isn't Measuring: Fixing the Design-to-Code Plateau

AI can build a component from a screenshot. It still can't tell you whether it matched. The fix is the same one that applies to every agentic loop: separate the measurement from the judgment.

#Agents #Verification #Design to Code #Claude Code #Frontend

June 23, 20265 min read

Claude Tag: Anthropic Puts an Agent in the Channel

Anthropic's new Claude Tag turns @Claude into a persistent, shared teammate inside Slack. Here's what it actually does, how it's governed, and why the form factor matters more than the feature list.

#Claude #Agents #Slack #Enterprise AI #Claude Code

June 17, 20268 min read

Loop Engineering, Defined

The unit of agentic work has moved from the prompt to the loop. A working definition, an anatomy of what a loop is made of, and the one principle that separates loops you can trust from agents that agree with themselves.

#Loop Engineering #Agents #Claude Code #Reliability #AI Engineering

June 16, 20269 min read

Why /loop Matters in Claude Code

/loop runs a prompt or a slash command on a cadence inside your open Claude Code session, so you stop re-asking 'did it finish yet?' and let the machine do the polling. This is what it is, the three ways to run it, where it earns its keep, and the session-scoped limits that tell you when to reach for durable automation instead.

#Claude Code #Agents #Automation #Workflow

June 9, 202612 min read

Don't Let the Agent Grade Itself: Verification Gates for Autonomous Claude Code

Running a coding agent unattended is tempting and mostly a trap. The thing that makes it safe isn't a better prompt. It's an external, deterministic gate the model cannot talk its way past. Here is the principle, a working pipeline that embodies it, and the failure modes that matter.

#Agents #Reliability #Claude Code #Evaluation

May 28, 20264 min read

Retrieval Is Not Grounding: Building RAG That Stays Honest

Fetching the right documents is necessary but not sufficient. Grounding (answers that are actually entailed by the retrieved evidence) is a separate property you have to design for and measure. Here is how I think about the gap, and the evaluation that closes it.

#RAG #Retrieval #Evaluation

March 10, 20264 min read

Multi-Agent Systems Need Boundaries, Not Bigger Prompts

When an agent system misbehaves, the instinct is to add more instructions. Usually the real fix is structural: explicit states, hard guards, and small tools with narrow contracts. Reliability is an architecture decision, not a prompting one.

#Agents #Reliability #Architecture

Journal

June 3, 2026exploring
Notes on chunking: smaller isn't always better
Spent the afternoon re-running a chunking ablation because a retrieval metric drifted and I wanted to know why. The folklore is "smaller chunks → better recall." That is true right…
#RAG #Retrieval
May 18, 2026note
An eval harness before the feature
Rule I keep relearning: build the evaluation before the feature, not after. It feels slower. You sit down to add a capability and instead spend the first hour assembling twenty lab…
#Evaluation #Process
April 22, 2026reading
Re-reading BM25 in the age of embeddings
Went back to the BM25 literature this week, partly out of nostalgia from my IR research days, partly because a hybrid retriever I'm tuning keeps reminding me how good the old lexic…
#Information Retrieval #Reading