Loop Engineering, Defined

In June 2026, a two-sentence post from Peter Steinberger (stop prompting coding agents, start designing the loops that prompt them) did several million views and a week of argument. Boris Cherny, who runs Claude Code at Anthropic, had been saying a version of it from the inside: his job now is to write loops, not prompts. Addy Osmani gave it a name people could repeat. "Loop engineering" arrived the way these terms always do: as a slogan first, and a definition never.

Naming a discipline is easy. Defining it is the work, because a fuzzy term lets everyone nod along while building incompatible things. So this is the attempt: what loop engineering actually is, what a loop is made of, and which part of it is load-bearing.

The third move

Loop engineering is the third move in a sequence, and it reads most clearly against the first two.

Prompt engineering optimized the instruction. You had one model call and one shot at it, and the craft was in the phrasing: the examples, the chain-of-thought scaffolding, the careful wording that nudged a single inference toward the answer you wanted. The artifact was a string.

Context engineering, which emerged through 2025, optimized what the model could see. Once systems ran across many turns with retrieval, memory, and tools, the bottleneck stopped being how you asked and became what was in the window when the model answered. The craft moved to assembling that window: which documents, which prior state, which tool results, ordered and compacted so the model had what it needed and little that it didn't. The artifact was an information environment.

Loop engineering optimizes what the model is allowed to do without you. Frontier models crossed a line where a single well-set-up call usually does the right thing. The open question is no longer "will the model understand this instruction" but "can I trust an unsupervised sequence of the model's own actions to converge on something correct, and stop." That is not a prompting problem and it is not a context problem. It is a control problem. The artifact is a system, most often code: a bash orchestrator, a graph, a scheduled pipeline.

Each move sits on top of the last. A loop is still full of prompts and still lives or dies by its context. But the thing you now design (the thing that decides whether the result is reliable) is the iteration structure around the calls.

A working definition

Loop engineering is the discipline of designing the control system in which one or more model calls operate with agency: the actions they can take, the checks applied to their output, what happens on failure, and the conditions under which the loop is allowed to stop.

Read that last clause twice. A loop that cannot stop is the defining failure of the field, and most of the discipline is downstream of getting termination right.

The act-observe-decide-repeat cycle at the center is not new; it traces back to the ReAct pattern out of Princeton and Google. What's new is the insistence (sharpened by the practice some call the Ralph Wiggum loop) that an external check, not the model's own judgment, is what declares the work done.

Anatomy of a loop

Strip a working agentic loop down and you find the same five parts, whether it's three lines of bash around claude -p or a nine-node graph.

The five parts of a loop: an actor proposes, a gate can say no, a repair path feeds failures back, three exits (success, exhaustion, escalation) decide when it stops, and a ledger records every step.

The actor: the model call (or calls) doing the work. Inside the loop the actor should be given real latitude; this is where the model's capability earns its keep.
The gate: the check on the actor's output. This is the part that can say no: a test suite, a type check, a schema validator, a compile step, an independent second judge. The gate decides whether the loop is done, not the actor.
The repair path: what gets fed back when the gate fails. A failing test's output, a stack trace, the validator's specific complaint. The next iteration should see the precise reason it failed, not a vague "try again."
The termination condition: when the loop stops. There are three exits, and a real loop needs all three: success (the gate passes), exhaustion (a budget (iterations, tokens, wall-clock) runs out), and escalation (it hands the problem back to you). A loop with only the first exit is a loop that runs until your bill or your patience does.
The ledger: what you can inspect afterward. Every action, gate result, and repair, logged. Without it you can't distinguish a loop that genuinely succeeded from one that learned to satisfy a weak gate, and you can't improve either.

The gate is the load-bearing wall

This is the part almost everyone underweights, and it's the reason loop engineering is worth taking seriously rather than treating as a productivity hack.

I've written before that you should never let the agent grade its own work. Loop engineering is why that argument matters. A loop is, mechanically, a machine for repeating an action until a check passes, so the check is the entire game. The sharpest line in the recent argument about all this came not from an essay but from a reply in the thread: a loop with nothing in it that can push back is just the model ratifying its own first answer, over and over.

That gives you a clean way to grade your own loops.

Deterministic gates (tests, types, compilation, schema validation) are strong because they sit outside the model and cannot be charmed. The code runs or it doesn't.

Non-deterministic gates (an LLM reviewer checking against a rubric) are sometimes unavoidable, for things like prose quality or design taste. But they are only as good as the rubric is specific and only as trustworthy as the judge is independent. A vague rubric is a gate that looks like it says no and actually says yes.

So the engineering instinct should be: push as much verification as you can onto deterministic checks at the boundary of the loop, and let the model be creative only on the inside. The general principle behind it is short enough to keep on a sticky note: the integrity of a loop is exactly the strength of its weakest gate. You do not make an agent reliable by reaching for a smarter model. You make it reliable by engineering the loop that contains it.

Loop smells

Four anti-patterns. Once you've named them, you see them everywhere.

The self-grading loop. The actor and the gate are the same model judging its own output. It will converge, on its own opinion of "done."
The unbounded loop. No iteration budget, no token ceiling, no escalation. It works in the demo and burns money in the dark.
The mega-prompt in a trench coat. One enormous prompt asked to do everything in a single pass, called a "system" because it's long. There's no loop in it (no act, observe, decide, repeat), so there's nothing to verify and nothing to recover from.
The soft gate. A gate that checks the cheap thing instead of the real one: that the code compiles rather than that it's correct, that the output matches a format rather than the goal. A loop optimizes the gate in front of it, not the goal in your head. Weak tests buy you a system that is confidently passing weak tests.

What it costs

Loop engineering is not free; it relocates the work rather than removing it. Three costs come with the territory.

Tokens. Unattended loops spend money unattended, and a loose gate that lets a loop run for hours is the expensive version of failure.

Correctness. The loop optimizes the gate, so a weak gate produces work that is verified and wrong, which is worse than work that is obviously broken, because nothing flags it.

Comprehension debt. When a loop ships code while you sleep, you wake up responsible for a system you have never read. The teams that come out ahead use loops to accelerate their own understanding; the ones that fall behind use them to avoid it.

The discipline, in one line

Prompt engineering was about talking to the model. Context engineering was about what the model knows. Loop engineering is about what the model is allowed to do when you are not watching, and that is the question that decides whether an agentic system is a toy or a piece of infrastructure.

The slogan version (stop prompting, start looping) is right but incomplete, because it says what to stop and not what to do well. Doing it well comes down to one discipline that's easy to state and hard to hold: put something in every loop that can tell the model no, and mean it.

Loop Engineering, Defined

The third move

A working definition

Anatomy of a loop

The gate is the load-bearing wall

Loop smells

What it costs

The discipline, in one line

Don't Let the Agent Grade Itself: Verification Gates for Autonomous Claude Code

Seeing Isn't Measuring: Fixing the Design-to-Code Plateau

Claude Tag: Anthropic Puts an Agent in the Channel

The third move

A working definition

Anatomy of a loop

The gate is the load-bearing wall

Loop smells

What it costs

The discipline, in one line

Related reading

Don't Let the Agent Grade Itself: Verification Gates for Autonomous Claude Code

Seeing Isn't Measuring: Fixing the Design-to-Code Plateau

Claude Tag: Anthropic Puts an Agent in the Channel