Editor’s Note: As AI agents become cheaper and easier to invoke, software development is entering a new phase: the question is no longer whether we can launch more agents, but whether humans still have enough attention to manage, evaluate, and integrate their outputs.

This article introduces an insightful concept: "orchestration tax." Launching an agent is inexpensive—requiring only a single prompt or a single click—but the real cost lies in subsequent steps: verifying the accuracy of results, understanding their impact on system architecture, resolving conflicts between different agents, and ultimately deciding which code can be merged into the main branch. These tasks cannot be easily parallelized and still rely on the same sequential resource: human judgment.

The author compares developers to the GIL in an AI Agent system—the single-threaded lock that limits the final throughput of a concurrent system. While multiple agents can run simultaneously, whenever they reach stages such as architectural decision-making, code review, or conflict resolution, they must all pass back through the developer’s mind. As a result, more agents do not necessarily mean higher output; they may simply lead to a longer queue of tasks awaiting review, causing developers to experience more frequent context switches and cognitive fatigue.

This is also a commonly overlooked point in the current AI programming tools boom: the feeling of efficiency isn't always the same as real productivity. A dashboard filled with running agents can create an illusion of high output; but if developers don’t truly understand, review, and integrate these changes, the system may ultimately accumulate not productivity, but technical debt and cognitive debt.

Therefore, what this article truly discusses is not "how to use more agents," but "how to redesign workflows around human attention." In the age of agents, the key skill is not just knowing how to ask questions or assign tasks, but understanding which tasks can be delegated to machines for parallel processing and which must remain under human judgment; knowing when to batch review, when to stop orchestrating, and when to refocus on a single core issue.

AI is expanding the concurrency of software production, but human attention remains the most scarce and irreplaceable resource in the system. Truly mature agent workflows do not delegate all tasks to machines; instead, they carefully design attention architectures, much like designing a production system.

The following is the original text:

Now, launching more AI agents has become much easier. But having more agents running simultaneously doesn’t mean “you” have multiplied. Your cognitive bandwidth cannot be parallelized. All the judgment required to guide them, evaluate their outcomes, and integrate or refine their outputs still must pass through the same serial processor—yourself.

所谓“编排税”，本质上就是你忽视这一点后所付出的代价。而唯一的真正解决方案，是像设计任何并发系统一样，开始设计你自己的注意力。

I previously participated in a roundtable discussion at Google I/O with Richard Seroter, Aja Hammerly, and Ciera Jaspan, discussing what software engineering looks like today and how it might evolve next. Near the end, Richard asked us: What is the one thing developers should take away and change after listening?

Attention architecture

I’ve been reflecting on this for months: feeling busy doesn’t mean you’re actually producing results. You can run 20 agents simultaneously and feel overwhelmed—but that doesn’t mean you’ve delivered the equivalent work output of 20 agents.

Earlier in that conversation, Richard gave a name to this issue. He said, "What you just described is essentially tax orchestration. You can't successfully manage 20 agents in your head."

He is absolutely right. I’d like to break down this concept more thoroughly, because this isn’t a matter of discipline—it’s an architectural issue.

In that roundtable, I casually said something that has since lingered in my mind: Running multiple agents doesn't mean there's another version of you in the world.

Asymmetry not accounted for by people

There is a hidden asymmetry in the agent workflow.

Launching an agent is very inexpensive—you only need to type a single keystroke or write one prompt. But completing the agent’s full cycle is anything but cheap. Someone always has to verify whether its output is correct and reconcile it with changes made by other agents.

This person is you. And you are only one.

Last month, I touched on part of this issue in "Your Parallel Agent Limit," primarily discussing the kind of environmental anxiety where you're unsure which parallel thread is quietly failing. This article aims to explore the structure behind these costs.

When you begin to view agent development as a concurrent system, you realize that humans are merely one component within that system—a very slow, sequential component.

You are the single-threaded resource.

If you've written concurrent code, you already have the intuition to understand this issue—you've just applied that intuition in the wrong place before.

Python has a Global Interpreter Lock, or GIL. You can create as many threads as you like, but only one thread can execute Python bytecode at a time, as all threads must first acquire this lock.

You are the GIL of your AI agent.

They can all run simultaneously. But whenever their work requires a deep understanding of the system architecture or involves resolving merge conflicts, they must first obtain this lock—and there is only one lock, held by you.

Amdahl's Law puts it very precisely: the upper limit of acceleration from parallelization depends on the portion of the work that must still be completed sequentially. If a large part of your process cannot be parallelized, no matter how many cores you invest, you will eventually hit a hard ceiling.

In agent development, this sequential part is judgment.

Launching eight agents won't speed up your decision time—it will only make the queue waiting for your processing longer.

This is a well-established fact in performance engineering, yet many are still surprised by it: optimizing non-bottleneck components does not improve overall throughput—you’re simply accumulating more unfinished work before the bottleneck.

The agent optimization improves a part that was never the bottleneck. The true constraint is the review phase, and the overall system throughput is exactly equal to the throughput of this phase.

Taxation of scheduling refers to the structural gap between an agent's capacity and the content you can actually consolidate. It occurs when you assign a single-threaded resource to manage a concurrent system.

Bullying through won't solve structural limits.

At that roundtable, I said: I have never felt my tools so efficient, but I have never been so tired.

Both feelings are completely real, and they stem from the same cause.

This fatigue has a very specific source: it’s the feeling of constantly pushing a single-threaded processor to 100% with no headroom whatsoever.

Every time you revisit an Agent that has left your focus, you incur a context-switching cost—you must clear your mind and reload a new context from scratch.

The CPU can accomplish this in microseconds, yet architects still strive to avoid frequent switching. You, however, take minutes to complete it and can never perfectly restore context.

Five agents are not five times the workload repeated five times. It’s five cold-start context reloads, plus a background process constantly worrying about which agent you should check right now.

You can't solve a structural limitation by trying harder. This tax must always be paid.

If you try to push through, it will eventually resurface in another form: either code reviews become increasingly superficial, or you fall into a state of "cognitive surrender"—because forming your own judgments consumes too much mental energy, you simply accept the code written by the Agent.

Either pay this tax proactively, or let it quietly erode your understanding of your own system.

Design your attention like a design system

Therefore, you must treat your attention as a scarce, serial resource.

You wouldn’t design a distributed system without considering bottlenecks at all—so give your brain the same respect.

Here are some methods that have truly worked for me:

Expand the Agent team based on review capabilities, not UI capabilities.

A well-designed concurrent system uses backpressure to prevent queues from growing indefinitely. Producers must slow down to match the consumer's processing capacity.

Your number of agents is the producer; your review capacity is the consumer. The correct number of parallel agents should be the number you can thoroughly review. For most people, this is typically a very low single digit.

AI tools will happily let you launch 20 agents, but that’s just a UI feature—it doesn’t mean you actually have the capacity to manage them.

Categorize the task.

When Richard asked me how to handle this, I mentioned this approach. I would divide the task into two piles.

The first set of tasks is relatively independent, and I’m happy to delegate them to an Agent running in the cloud backend. These tasks can be executed asynchronously and typically only require my final review before completion.

The second category consists of complex tasks, where the work itself is judgment—such as diagnosing a strange bug or designing an architecture.

The biggest mistake is trying to parallelize Type II tasks. Parallelizing multiple complex tasks won’t increase your output—it will only cause repeated contention for that lock, ultimately degrading all results.

Batch review.

Each context switch comes at a high cost. Sitting down to review the results of all four agents at once is far more efficient than reviewing one, doing something else, and then restarting to review another.

Give the Agent a longer leash. Let the work accumulate slightly, then process it as a batch.

Use this lock solely for judgment.

Don't waste your brain on things that machines can verify themselves. Have the Agent write tests that pass or generate screenshots.

Let them prove the 80% that’s dull but verifiable. That way, your scarce attention can focus solely on the 20% that truly requires human judgment.

Protect your serial time.

The bottleneck requires your best time, not the fragmented moments left between occasional Agent checks.

Sometimes, the most powerful move with maximum leverage is to stop arranging entirely: shut down the computer filled with agents, focus solely on one question, and hold onto that key throughout the entire process.

Scheduling isn't real work—it's just the overhead generated around work.

Aja pointed out that architectural skills have now become the most urgent skill: you need to know which tasks are suitable for an Agent and which are too large for it.

I’d also like to add: you are yourself a component within this system. Your attention has a known, very low serial throughput. The system either respects this limit or circumvents it by quietly lowering your standards.

Being busy doesn't mean being productive.

This is very important because this failure mode is nearly invisible to you personally.

Twenty running agents will give you a feeling of “maximum productivity”—a dashboard packed with activity, everything in motion. But this sensation has become disconnected from actually merging high-quality code into the main branch.

You can be maximally busy yet produce almost nothing. From an internal perspective, these two states are nearly identical.

Ciera mentioned Margaret-Anne Storey’s research on debt. We discussed technical debt and cognitive debt.

Failing to pay the settlement tax will cause you to accumulate both types of debt simultaneously.

You merged things you never carefully read. Your mental model of the codebase is completely outdated. These issues won’t appear on the dashboard today—they’ll surface when the system fails in production, and suddenly you’re staring at it, realizing you no longer understand how it actually works.

So the real conclusion is: launching an agent isn't a capability. Anyone can run 20.

True capability lies in designing systems around serial resources that cannot be cloned or parallelized.

This resource is your attention.

Design it as you would any critical component relied upon in a production environment.