Tuesday, June 23, 2026
HomeTechnologyLoop Engineering – O’Reilly

Loop Engineering – O’Reilly

The next article initially appeared on Addy Osmani’s weblog and is being reposted right here with the creator’s permission.

Loop engineering is changing your self as the one who prompts the agent. You design the system that does it as an alternative. A loop right here will be regarded as a recursive aim the place you outline a objective and the AI iterates till full. I imagine this can be the way forward for how we work with coding brokers. Nonetheless, it’s nonetheless early; I’m skeptical, and also you completely have to be cautious about token prices (utilization patterns can fluctuate wildly if you’re token wealthy or poor), so I wish to unpack what it’s and what it means.

Peter Steinberger just lately mentioned: “You shouldn’t be prompting coding brokers anymore. You have to be designing loops that immediate your brokers.” Equally, Boris Cherny, head of Claude Code at Anthropic, mentioned, “I don’t immediate Claude anymore. I’ve loops working that immediate Claude and determining what to do. My job is to jot down loops”.

Okay, so what does any of that imply?

For like two years, the best way you bought one thing out of a coding agent was you wrote a superb immediate and shared sufficient context. You sort a factor, you learn what got here again, you sort the following factor. The agent is a instrument and you’re holding it your complete time, one flip after the opposite. That half is type of over, or a minimum of some suppose it’s going to be.

Now you construct a small system that finds the work, arms it out, checks it, writes down what is completed after which decides the following factor, and also you let that system poke the brokers as an alternative of you. I wrote earlier than in regards to the cousin of this, agent harness engineering, which is making the setting one single agent runs inside and the manufacturing facility mannequin—the system that builds the software program. Loop engineering sits one ground above the harness. The harness nevertheless it runs on a timer, it spawns little helpers, and it feeds itself.

The factor that shocked me is this isn’t actually a instrument factor anymore. A yr in the past should you wished a loop you wrote a pile of bash and also you maintained that pile perpetually and it was yours and solely yours. Now the items simply ship contained in the merchandise. Steinberger’s record maps virtually precisely onto the Codex app, after which virtually the identical onto Claude Code. And when you discover the form is identical, you cease arguing about which instrument. You simply design a loop that also works irrespective of which one you occur to be sitting in.

The 5 items, after which notes

A loop wants 5 issues after which one place to recollect stuff. Let me record it first after which map it.

  1. Automations that go off on a schedule and do discovery and triage by themselves
  2. Worktrees so two brokers working in parallel don’t step on one another
  3. Expertise to jot down down the venture data the agent would in any other case simply guess
  4. Plugins and connectors to plug the agent into the instruments you already use
  5. Subagents so one in every of them has the thought and a distinct one checks it

Then the sixth factor, the reminiscence. A Markdown file, or a Linear board, something that lives exterior the one dialog and holds what’s performed and what’s subsequent. Sounds too dumb to matter. However it’s the identical trick each long-running agent relies on, and I went into it in “Lengthy-Working Brokers”: The mannequin forgets every little thing between runs so the reminiscence must be on disk and never within the context. The agent forgets; the repo doesn’t.

Each merchandise have all 5 now.

Primitive Job within the loop Codex app Claude Code
Automations Discovery + triage on a schedule Automations tab: decide venture, immediate, cadence, setting; outcomes land in a Triage inbox; /aim for run-until-done Scheduled duties and cron, /loop, /aim, hooks, GitHub Actions
Worktrees Isolate parallel options Constructed-in worktree per thread git worktree, --worktree, isolation: worktree on a subagent
Expertise Codify venture data Agent Expertise (SKILL.md), invoked with $identify or implicitly Agent Expertise (SKILL.md)
Plugins and connectors Join your instruments Connectors (MCP) plus plugins for distribution MCP servers plus plugins
Subagents Ideate and confirm Subagents outlined as TOML in .codex/brokers/ Job subagents in .claude/brokers/, agent groups
State monitor what’s performed Markdown or Linear by way of a connector Markdown (AGENTS.md, progress recordsdata) or Linear by way of MCP

The names are a bit totally different right here and there, however the functionality is identical factor. Let me go one after the other as a result of truthfully the main points are the place a loop both holds collectively or quietly leaks in all places.

Automations, that is the heartbeat

Automations are what make a loop an precise loop and never only one run you probably did as soon as. Within the Codex app you make one within the Automations tab and also you decide the venture, the immediate it can run, how usually, and if it runs in your native checkout or on a background worktree. The runs that discover one thing go to a Triage inbox, and the runs that discover nothing simply archive themselves which is good. OpenAI makes use of them internally for boring stuff like every day subject triage, summarizing CI failures, writing commit briefings, and looking bugs anyone added final week. And an automation can name a ability, so you retain the recurring factor maintainable; you fireplace $skill-name as an alternative of pasting an enormous wall of directions right into a schedule that no one will ever replace.

Claude Code will get to the identical place however via scheduling and hooks. You may run a immediate or a command on a interval with /loop, you may schedule a cron activity, you may fireplace shell instructions at sure factors within the agent lifecycle with hooks, otherwise you push the entire thing to GitHub Actions if you need it to maintain working after you shut the laptop computer. Identical concept precisely, you outline an autonomous activity, you give it a cadence, and the findings come to you so you aren’t the one going round checking.

There’s a second in-session primitive price realizing, and it’s the one nearer to what this entire publish is about. /loop re-runs on a cadence. /aim retains going till a situation you wrote is definitely true, and after each flip a separate small mannequin checks whether or not you’re performed, so the agent that wrote the code isn’t the one grading it. You give it one thing like “all exams in check/auth move and lint is clear” and stroll away. Codex has the identical factor, additionally known as /aim: It retains working throughout turns till a verifiable stopping situation holds, with pause and resume and clear. Identical primitive, each instruments, which is type of the sample for this entire article.

So that is the half that surfaces the work. The remainder of the loop is what acts on it.

Worktrees, so parallel doesn’t flip into chaos

The second you run a couple of agent, the recordsdata begin colliding; that turns into the failure. Two brokers writing the identical file is the very same headache as two engineers committing to the identical traces and no one talked to one another first. A Git worktree fixes it. It’s a separate working listing by itself department sharing the identical repo historical past, so one agent’s edits actually can not contact the opposite one’s checkout.

Codex builds the worktree assist proper in so a number of threads hit the identical repo directly and don’t stumble upon one another. Claude Code offers you a similar isolation with git worktree, a --worktree flag to open a session in its personal checkout, and a isolation: worktree setting you stick on a subagent so every helper will get a contemporary checkout that cleans itself up after. (I wrote in regards to the human aspect of all this in “The Orchestration Tax.”) The worktrees take away the mechanical collision, however YOU are nonetheless the ceiling. Your evaluate of bandwidth decides what number of you may really run, not the instrument.

Expertise, so that you cease explaining your venture each single time

A ability is the way you cease reexplaining the identical venture context each session like a goldfish. Each instruments use the identical format: a folder with a SKILL.md inside holding directions and metadata, after which non-obligatory scripts, references, and belongings. Codex runs a ability while you name it with $ or /expertise, or by itself when your activity matches the ability description, which is the explanation a decent, boring description beats a intelligent one. Claude Code does it the identical approach and I wrote the sample up in “Agent Expertise.”

Expertise are additionally the place intent stops costing you time and again. I argued in “The Intent Debt” that an agent begins each session chilly and it’ll fill any gap in your intent with a assured guess. A ability is that intent written down on the skin, the conventions, the construct steps, the “we don’t do it like this due to that one incident,” written one time the place the agent reads it each run. With out expertise the loop rederives your entire venture from zero each cycle; with expertise it type of compounds.

One factor to maintain straight: The ability is the authoring format, and a plugin is the way you ship it. If you wish to share a ability throughout repos or bundle a number of collectively, you package deal them as a plugin. True in Codex, true in Claude Code.

Plugins and connectors, the loop touches your actual instruments

A loop that may solely see the filesystem is a tiny loop. Connectors, that are constructed on MCP, let the agent learn your subject tracker, question a database, hit a staging API, or drop a message in Slack. Codex and Claude Code each converse MCP so the connector you wrote for one often simply works within the different. And plugins bundle connectors and expertise collectively so your teammate installs your setup in a single go as an alternative of rebuilding the entire thing from reminiscence.

That is the distinction between an agent that claims “right here is the repair” and a loop that opens the PR, hyperlinks the Linear ticket, and pings the channel as soon as CI is inexperienced by itself. The connectors are the explanation the loop can act inside your precise setting as an alternative of simply telling you what it could do if it might.

Subagents, preserve the maker away from the checker

Essentially the most helpful structural factor in a loop, by far, is splitting the one who writes from the one who checks. The mannequin that wrote the code is approach too good grading its personal homework. A second agent with totally different directions and generally a distinct mannequin catches the stuff the primary one talked itself into.

Codex solely spawns subagents while you ask, runs them on the similar time, after which folds the outcomes again into one reply. You outline your individual brokers as TOML recordsdata in .codex/brokers/, every with a reputation, an outline, directions, and non-obligatory mannequin and reasoning effort, so your safety reviewer could be a robust mannequin on excessive effort whereas your explorer is a few quick read-only factor. Claude Code does the identical with subagents in .claude/brokers/ and agent groups that move work between them. The same old break up in each is one agent explores, one implements, and one verifies in opposition to the spec.

I made this case twice already, as soon as as “The Code Agent Orchestra” and as soon as as “Adversarial Code Evaluate.” The explanation it issues particularly inside a loop is the loop runs when you are not watching, so a verifier you really belief is the one purpose you may stroll away. Subagents do burn extra tokens since every one does its personal mannequin and gear work, so spend them the place a second opinion is price paying for. That is additionally mainly what Claude Code’s /aim does beneath the hood: A contemporary mannequin decides if the loop is completed as an alternative of the one which did the work, the maker and checker break up utilized to the cease situation itself.

What one loop seems to be like

Stick it collectively and a single thread turns into a little bit management panel. Right here is one form I preserve utilizing.

An automation runs each morning on the repo. Its immediate calls a triage ability that reads yesterday’s CI failures, the open points, and the current commits and writes the findings right into a Markdown file or a Linear board. For every discovering that’s price doing, the thread opens an remoted worktree and sends a subagent to draft the repair, and a second subagent opinions that draft in opposition to the venture expertise and the prevailing exams.

Connectors let the loop open the PR and replace the ticket. Something the loop can not deal with lands within the triage inbox for me. The state file is the backbone of the entire thing; it remembers what obtained tried, what handed, and what’s nonetheless open, so tomorrow morning the run picks up the place at present stopped.

And take a look at what you really did there. You designed it one time. You didn’t immediate any of these steps. That’s Steinberger’s entire level made actual, and it’s the identical loop in Codex or in Claude Code as a result of the items are the identical items.

What the loop nonetheless doesn’t do for you

The loop adjustments the work; it doesn’t delete you from it. And three issues really get sharper because the loop will get higher, not simpler.

Verification continues to be on you. A loop working unattended can be a loop making errors unattended. The entire purpose you break up the verifier subagent from the maker is to make the loop’s “it’s performed” imply one thing, and even then “performed” is a declare and never a proof. I preserve saying the identical line from “Code Evaluate within the Age of AI”: Your job is to ship code you confirmed works.

Your understanding nonetheless rots should you permit it. The quicker the loop ships code you didn’t write, the larger the hole between what exists and what you really get. That’s comprehension debt and a easy loop simply makes it develop quicker except you learn what the loop made.

And the snug posture is the damaging one. When the loop runs itself, it’s very tempting to cease having an opinion and simply take no matter it offers again. I known as that “cognitive give up.” Designing the loop is the treatment while you do it with judgment and the accelerant while you do it to keep away from pondering: similar motion, reverse outcome.

Construct the loop. Keep the engineer.

I feel this can be a preview of how our work goes to evolve. That mentioned, if I weren’t reviewing the code myself or if I relied solely on automated loops to repair it, my product’s high quality would undergo. I’d probably find yourself caught in a downward spiral, constantly digging myself right into a deeper gap.

Go forward and arrange your loops, however don’t overlook that prompting your brokers immediately can be efficient. It’s all about discovering the correct steadiness.

Loops may end in totally different outcomes relying on you. Two individuals can construct the very same loop and get utterly reverse outcomes. One makes use of it to maneuver quicker on work they perceive deeply. The opposite makes use of it to keep away from understanding the work in any respect. The loop doesn’t know the distinction. You do.

That’s what makes loop design tougher than immediate engineering. Cherny’s level isn’t that the work obtained simpler. It’s that the leverage level moved.

Construct the loop. However construct it like somebody who intends to remain the engineer, not simply the one who presses go.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments