Loop Engineering.
Original author: Addy Osmani
Compiled by: Peggy, BlockBeats

Editor’s Note: The way AI coding agents are used is shifting from “humans manually writing prompts and advancing tasks step by step” to “humans designing loops that enable the system to continuously orchestrate agents.” Addy Osmani’s concept of Loop Engineering centers on building a workflow that automatically discovers tasks, assigns them, verifies results, tracks progress, and determines the next steps.

This cycle consists of five main modules: Automations (for scheduling and triaging tasks), Worktrees (to isolate multiple parallel development environments), Skills (to capture project knowledge and team conventions), Plugins/Connectors (to integrate with real tools like GitHub, Linear, Slack, and databases), and Sub-agents (to separate executors from reviewers), along with an external memory layer—such as Markdown files or Linear boards—to store state and progress.

The article emphasizes that the significance of Loop Engineering lies not merely in "making AI run more rounds," but in embedding engineers' judgment into system design. Loops can significantly amplify developers' leverage, but they do not replace verification, understanding, or decision-making. The real risk is not in using loops, but in using them as an excuse to avoid understanding the code and system. The key skill for collaborating with AI in programming in the future may no longer be just writing a good prompt, but designing reliable, verifiable, and sustainably operating agent workflows.

The following is the original text:

Loop engineering is replacing your role as "the person writing prompts for agents." You need to design a system that prompts the agent on your behalf. Here, the "loop" can be understood as a recursive goal: you define an objective, and the AI iterates continuously until the task is completed. It consists broadly of five components, and both Claude Code and Codex now possess all five.

I believe this may be how we’ll eventually collaborate with coding agents. But all of this is still in its early stages, and I remain skeptical. You absolutely need to be cautious about token costs, as expenses can vary dramatically depending on usage patterns—especially whether you’re “token-rich” or “token-constrained.” You also need some mechanism to ensure quality doesn’t degrade. Concerns about “AI slop” are valid. That said, let’s take a closer look at what’s actually going on here.

@steipete recently said: “You shouldn’t write prompts for coding agents anymore. You should design loops that prompt your agents.” Similarly, @bcherny, head of Claude Code at Anthropic, said: “I no longer prompt Claude directly. I have a set of loops running that prompt Claude and decide for themselves what to do next. My job is to write the loops.”

So, what does this actually mean?

For the past couple of years, if you wanted to get something done with a coding agent, the basic approach was to write a good prompt and provide sufficient context. You’d type in one sentence, read the response, then type the next. The agent was a tool, and you were holding onto it, pushing it forward turn by turn. This phase has, to some extent, come to an end—at least, some believe it’s about to.

Now, you're building a small system: it discovers tasks on its own, assigns them, checks results, logs completion, and decides what to do next. In other words, you let this system drive the agents rather than prompting them manually again and again. I previously wrote about its "cousin"—agent harness engineering, which involves setting up a runtime environment for a single agent; and the factory model, which is a system for building software. Loop engineering sits one level above the harness. It resembles a harness but runs on a timer, generates small assistants, and feeds itself.

What surprised me is that this is no longer just a “tool-level” issue. A year ago, if you wanted a loop, you had to write a large amount of bash scripts and maintain them forever. That was your own thing, and only yours. Now, these components are built directly into the products. The capabilities Steinberger listed can almost be mapped one-to-one to Codex applications, and just as easily to Claude Code. Once you realize their form is the same, you’ll stop worrying about which tool to use and instead focus on designing a loop: no matter which tool you’re sitting in, it will keep running.

Five components, along with some explanations

A loop requires five things, plus a place to store information. I'll list them first, then match each one.

First, Automations: Triggered on a schedule to automatically discover and route tasks.

Second, Worktrees: Prevent two parallel agents from overwriting each other’s files.

Third, Skills: Document the project knowledge to avoid having the agent guess each time.

Fourth, Plugins and Connectors: Enable the agent to integrate with tools you are already using.

Fifth, sub-agents: one responsible for proposing solutions, and another for reviewing them.

Then there’s the sixth thing: memory. It can be a Markdown file, a Linear board, or any standalone system that persists “completed tasks” and “next steps” beyond a single conversation. It may sound too simple to matter, but this is the exact same technique every long-running agent relies on. I’ve written about this in detail in the context of long-running agents: models forget between runs, so memory must be stored on disk, not in context. The agent forgets—but the code repository doesn’t.

Now, both products have all five components.

They may have different names in some places, but their capabilities are essentially the same. I’ll go through them one by one, because honestly, whether a loop runs stably or quietly leaks everywhere comes down to the details.

Automations: This is the heartbeat of the loop

Automations are what make a loop truly a loop, rather than a one-time task you ran manually once. In the Codex app, you can create an automation under the Automations tab by selecting a project, the prompt it should run, its frequency, and whether it runs in your local checkout or in a background worktree. Results that identify issues are routed to the Triage inbox, while runs without issues are automatically archived—this is great. OpenAI itself uses it for dull but necessary tasks like daily issue triage, summarizing CI failures, writing commit briefs, and tracking bugs introduced last week. Automations can also invoke skills, so you can keep recurring tasks maintainable: trigger $skill-name instead of pasting a wall of instructions into a scheduled task no one will ever update.

Claude Code can achieve the same effect, just through a different path: it uses scheduling and hooks. You can run a prompt or command at fixed intervals using /loop, schedule a cron job, or trigger shell commands via hooks at specific points in the agent’s lifecycle. If you want it to continue running after you close your laptop, you can also deploy the entire setup to GitHub Actions. The core idea is identical: you define an autonomous task, give it a rhythm, and let the results come to you rather than having to check everywhere yourself.

Another primitive worth understanding, which gets closer to the core topic of this article, is /loop, which runs repeatedly at a steady rhythm; and /goal, which continues executing until a condition you specify is truly met. After each round, a separate small model evaluates whether the task is complete, so the agent writing the code is not the one scoring itself. You can give it a condition such as “all tests in test/auth pass and lint is clean,” then walk away. Codex has the same capability, also called /goal. It works across rounds until a verifiable stopping condition is met, and supports pausing, resuming, and clearing. The same primitive exists in both tools. This is essentially the recurring pattern in this article.

So, Automations bring the work to the surface, while the rest of the loop handles that work.

Worktrees: Keep parallel workflows organized, not chaotic

Once you run more than one agent, file conflicts become a point of failure. Two agents writing to the same file simultaneously is as problematic as two engineers modifying the same line of code without communication. Git worktree can solve this issue. It is a separate working directory on an independent branch but shares the same repository history, so one agent’s changes cannot physically interfere with another agent’s checkout.

Codex has built-in worktree support, so multiple threads can simultaneously process the same repository without interfering with each other. Claude Code can achieve the same isolation via git worktree: you can open a session in a separate checkout using the --worktree flag, or set isolation: worktree on subagents to give each assistant a fresh checkout that is automatically cleaned up afterward. I’ve written about the human side of this in The Orchestration Tax: worktrees eliminate mechanical conflicts, but you are still the bottleneck. What truly determines how many agents you can run simultaneously isn’t the tool—it’s your review bandwidth.

Skills: Save you from having to re-explain projects each time

Skill is a mechanism that eliminates the need to re-explain the same context in every session, like a goldfish. Both tools use the same format: a folder containing a SKILL.md file that stores instructions and metadata, along with optional scripts, references, and resource files. Codex will run a skill when you invoke it with $ or /skills, and will also automatically run it when your task matches the skill’s description. This is why a concise, straightforward description is often better than a clever, elaborate one. Claude Code follows the same approach—I’ve described this pattern in agent skills.

Skills are also the place where intent no longer repeatedly drains you. I mentioned in intent debt that agents start each conversation from a cold start; whenever there’s a gap in your intent, they fill it with confident guesses. Skills externalize this intent: project agreements, build steps, “We don’t do this because of that incident before,” and so on—all written once in a place the agent reads every time it runs. Without skills, the loop must re-derive your entire project from scratch each round; with skills, it’s like compound growth.

One thing to clarify: a skill defines the format, while a plugin is a distribution method. When you want to share a skill across multiple code repositories or bundle several skills together, you package them as a plugin. This applies to Codex, and also to Claude Code.

Plugins and connectors: Let loop connect to your real tools

A loop that can only access the file system is a small loop. Connectors, built on MCP, allow agents to read your issue tracker, query databases, call staging APIs, or send messages in Slack. Both Codex and Claude Code support MCP, so a connector you write for one typically works in the other as well. Plugins bundle connectors and skills together, enabling your teammates to install the full configuration in one go, rather than trying to reconstruct it from memory.

This is the difference between “an agent telling you ‘here’s the fix’” and “a loop that opens a PR, links to a Linear ticket, and notifies the channel after CI passes.” Connectors matter because they enable the loop to act within your real environment, not just say, “If I could, I would.”

Sub-agents: Keep manufacturers away from inspectors

Within a loop, the most useful structural design is to separate the person who writes from the person who reviews. The model that writes code is too likely to be overly lenient when grading its own work. Another agent, with different instructions and sometimes even a different model, can catch issues that the first agent overlooks after convincing itself they are fine.

Codex only generates subagents when you request them; they run in parallel and then merge their results into a single answer. You can define your own agents using TOML files in .codex/agents/: each agent has a name, description, instructions, and optional model and reasoning intensity. For example, your security reviewer can be a strong model with high reasoning intensity, while your explorer can be a fast, read-only lightweight model. Claude Code also enables similar capabilities through subagents and agent teams in .claude/agents/, allowing multiple agents to pass work between each other. The most common division of labor on both sides is: one agent explores, one implements, and one validates against specifications.

I’ve articulated this point twice before: once in Code Agent Orchestra, and again in Adversarial Code Review. It’s especially critical within a loop, because loops run when you’re not watching—so a verifier you truly trust is the only reason you can safely walk away. Subagents do consume more tokens, since each agent makes its own model and tool calls, so you should deploy them where a second opinion is worth paying for. This is essentially what Claude Code’s /goal does at its core: using a separate model to judge whether the loop is complete, rather than letting the model that did the work make that call. In other words, it applies the separation of “creator” and “checker” to the stopping condition itself.

What does a loop look like?

Putting these together, a single thread becomes a small control panel. Below is a structure I often use.

Every morning, an automation runs on the code repository. Its prompt invokes a triage skill that reads yesterday’s CI failures, open issues, and recent commits, then logs the findings into a Markdown file or Linear board. For each issue worth addressing, the thread opens an isolated worktree, dispatches a sub-agent to draft a fix, and then assigns a second sub-agent to review the proposal based on the project’s skills and existing tests.

Connectors enable this loop to automatically open PRs and update tickets. Anything the loop cannot handle is routed to the triage inbox for me to address. The status file is the backbone of the entire system: it tracks what has been attempted, what succeeded, and what remains incomplete, ensuring that tomorrow’s run picks up exactly where today’s left off.

Pay attention to what you actually did. You merely designed it once. Those steps were not individually prompted by you. This is the real-world version of Steinberger’s quote. And the same loop can run on Codex or Claude Code because the components themselves are the same set of components.

Loop still won't do anything for you

Loop has changed how it works, but it won't remove you from the process. In fact, as loop becomes stronger, three issues become more acute, not easier.

Verification still depends on you. A loop running unattended may also be making mistakes unattended. You separated the verifier sub-agent from the maker precisely so that the loop’s claim of “done” has some meaning. Even then, “done” is still a claim, not proof. I keep repeating the same line in “Code Review in the Age of AI”: your responsibility is to deliver code you have confirmed to be valid.

If you leave it unattended, your own understanding will rot. The faster Loop delivers code you didn’t write yourself, the greater the gap becomes between what you actually understand and what truly exists in the system. This is comprehension debt. If you don’t read what Loop produces, a smooth Loop will only make this debt grow faster.

And yes, the most comfortable posture is often also the most dangerous one. When a loop can run on its own, it’s easy to stop forming your own judgments and simply accept whatever it returns. I call this cognitive surrender. If you design a loop with discernment, it becomes the antidote; if you design a loop merely to avoid thinking, it becomes an accelerant. The same action can produce completely opposite results.

Build a loop, but still be an engineer

I believe this signals the evolving direction of our future work. That said, if I don’t personally review the code or rely entirely on automated loops to fix it, my product quality will suffer. I’m likely to fall into a downward spiral: continually digging myself into deeper holes.

So, you can certainly build your own loop, but don’t forget that directly prompting your agent is still effective. The key is finding the right balance.

The results of a loop will vary from person to person. Two people can build the exact same loop and yet achieve completely opposite outcomes. One uses it to accelerate work they deeply understand; the other uses it to avoid understanding the work itself. The loop doesn’t know the difference between the two. You do.

This is why loop design is harder, not easier, than prompt engineering. Cherny didn’t mean the work became easier, but that the leverage point shifted.

Build the loop. But build it like someone who still intends to be an engineer, not like someone who just presses the “start” button.

[Original link]

Click to learn about the open positions at BlockBeats

Welcome to the official BlockBeats community:

Telegram subscription group: https://t.me/theblockbeats

Telegram group: https://t.me/BlockBeats_App

Official Twitter account: https://twitter.com/BlockBeatsAsia