A harness for every task: dynamic workflows in Claude Code

Original author: @trq212

Compile: Peggy

Editor's Note: Claude Code is evolving from a code assistant into an orchestratable Agent workspace.

The workflows described in this article provide core value by enabling Claude to move beyond merely “thinking then acting” within a single context window, and instead dynamically generate an execution framework: breaking down tasks, assigning sub-agents, processing in parallel, cross-validating, iterating in cycles, and even allowing different agents to compete before synthesizing the final results.

This means that the use cases for Claude Code are clearly expanding beyond its original scope. It is not only useful for code migration, refactoring, test reproduction, and code review, but also for non-technical tasks such as in-depth research, fact-checking, resume screening, incident retrospectives, rule documentation, business plan evaluation, and naming brainstorming. Many complex tasks are fundamentally similar to programming: they require breaking down problems, isolating context, validating assumptions, managing large volumes of details, and making decisions among multiple potential paths.

Dynamic workflows aim to address several common issues in large models during long tasks: "agent laziness," where the model declares completion halfway through; "self-preference bias," where it tends to favor its own conclusions; and "goal drift," where performance gradually deviates from the original objective after multiple iterations. By assigning the task to multiple Claudes with independent contexts, it transforms complex tasks from a "single-agent marathon" into a "multi-agent collaboration."

Of course, workflows are not a universal solution. They often consume more tokens and may not be suitable for every routine coding task. But they point to an important direction: the future competition among AI tools may not just be about how intelligent a single model is, but whether it can organize a reliable, reusable, and auditable execution process around complex goals.

The following is the original text:

Although the default Claude Code execution framework is built for programming, it is also suitable for many other types of tasks. Many tasks have been found to be structurally similar to programming tasks. However, to achieve optimal performance for certain specific task types, we still need to build customized execution frameworks on top of Claude Code, such as for research, security analysis, agent team collaboration, or code review.

Workflows allow you to dynamically create execution frameworks that enable Claude to natively solve the above problems and many other types of issues within Claude Code. You can also share and reuse these workflows.

In this article, I’ll share my initial experiences and insights using workflows to help you unlock its full potential.

However, it should be noted that relevant best practices are still evolving. Dynamic workflows typically consume more tokens, so you should carefully consider when and how to use them.

Note: This article is also published on the Claude Blog.

Example Prompt

Before diving into the technical details, I’d like to show you some example prompts to help you understand the possibilities of workflows:

This test fails approximately once every 50 runs. Set up a workflow to reproduce it, formulate hypotheses, and conduct adversarial testing across different worktrees. /goal Do not stop until one hypothesis is verified.

Use the workflow to review my last 50 sessions, identify recurring corrections I've made, and turn these repeated issues into CLAUDE.md rules.

Use the workflow to review the past six months of the #incidents channel on Slack and identify recurring root causes that haven't had tickets submitted.

Run my business plan through a workflow where different agents analyze it from the perspectives of investors, customers, and competitors.

There is a folder containing 80 resumes. Use the workflow to sort them according to the backend position requirements and review the top ten. Use the AskUserQuestion tool to ask me questions to help establish your evaluation criteria.

I need to name this CLI tool. Use workflow to brainstorm a list of options, then select the top three using a tournament mechanism.

Use the workflow to rename our User model to Account everywhere.

Read my blog draft and use the workflow to verify every technical claim against the codebase. I don’t want to publish anything incorrect.

How does the dynamic workflow work?

The dynamic workflow executes a JavaScript file containing several special functions for generating and coordinating sub-agents.

The dynamic workflow also includes standard JavaScript functions such as JSON, Math, and Array for processing data.

Particularly noteworthy is that the dynamic workflow can determine which model a given agent uses and whether sub-agents run within their own worktree. This allows Claude to autonomously select the required level of intelligence and isolation based on task needs.

If a workflow is interrupted, for example, by a user manually performing an action or the terminal exiting, the session can be resumed and the workflow will continue from where it was interrupted.

Why is a dynamic workflow necessary?

When you have the default Claude Code execution framework handle a task, it must simultaneously plan and execute within the same context window. While this approach works very well for many programming tasks, it can sometimes fail in long-running, large-scale parallel, or highly structured adversarial tasks.

The reason is that the longer Claude processes complex tasks within a single context window, the more prone it becomes to certain types of failure modes:

Agentic laziness refers to Claude prematurely stopping and declaring a task complete before fully completing it, especially when dealing with complex, multi-part tasks—such as finishing only 20 out of 50 items in a security review and then claiming the work is done.

Self-preferential bias refers to Claude's tendency to favor its own results or findings, especially when asked to verify or evaluate its own output against a set of criteria.

Goal drift refers to the gradual decline in Claude’s adherence to the original goal during multi-round execution, especially after context is compressed. Each summary results in information loss, and certain detailed requirements—such as edge cases or constraints like “do not do X”—may be lost.

Creating a workflow helps alleviate these issues by orchestrating multiple independent Claudes, each with its own context window and focused on clearly defined, isolated tasks.

Dynamic workflows and static workflows

You may have previously created static workflows using the Claude Agent SDK or claude -p to coordinate multiple Claude Code instances.

However, because static workflows need to cover various edge cases, they are typically more generic. With the arrival of Claude Opus 4.8 and dynamic workflows, Claude is now intelligent enough to create a customized execution framework tailored to your specific use case.

Practical patterns when using dynamic workflows

You can directly have Claude create a dynamic workflow, or use the trigger word "ultracode" to ensure Claude Code creates a workflow.

However, if you can build a mental model of how dynamic workflows operate, it becomes easier to determine when to use them and to guide Claude through prompts.

When building workflows, Claude commonly uses and combines the following patterns:

Classify and execute: Use a classification agent to determine the task type, then route to the appropriate agent or action based on the type. A classifier can also be used at the end of the process to evaluate the output.

Fan-out and synthesis: Break a task into multiple smaller steps, with each step handled by a separate agent, then synthesize the results. This approach is especially suitable for tasks involving numerous small steps, or when each step requires a clean context window to avoid interference or cross-contamination. The synthesis step acts as a “barrier”: it waits for all fan-out agents to complete, then merges their structured outputs into a single result.

Adversarial verification: For each generated agent, run a separate agent to perform adversarial verification of its output according to a set of evaluation criteria or guidelines.

Generate and filter: Generate a large number of ideas around a topic, then screen them using evaluation criteria or validation processes to remove duplicates and return only the tested, highest-quality ideas.

Tournament: Instead of splitting the work, let agents compete with each other. Generate N agents, each attempting to complete the same task using different methods. Then, use a prompt or model to evaluate and pairwise compare the agents’ results until a winner is selected.

Loop until completion: For tasks with unknown workload, do not set a fixed number of rounds; instead, continuously generate agents until a stopping condition is met, such as no new discoveries appearing or no more errors occurring in the logs.

Use cases

You can think more creatively about when and how to have Claude Code create dynamic workflows. I’ve found that workflows can be even more useful in non-technical work.

Migration and Refactoring

Bun was rewritten from Zig to Rust using workflows. You can read Jarred’s post on X to learn the details.

The key is to break the task into a series of steps that need to be addressed, such as call points, failure tests, modules, etc. Launch a sub-agent in the worktree for each fix task to complete the repair; then have another agent perform adversarial review, and finally merge the results. You can explicitly instruct the agent to avoid using resource-intensive commands, thereby maximizing parallelism without exhausting local machine resources.

In-depth research

We have released a deep research skill (/deep-research) in Claude Code that uses dynamic workflows. Specifically, it branches out to perform web searches, scrape sources, conduct adversarial validation of relevant claims, and synthesize a citation-rich report.

But this type of research isn't limited to web searches. For example, you can also have Claude compile a status report from Slack context or investigate how a feature works by exploring a codebase in depth.

Deep verification

On the other hand, if you have a report and want to verify every factual claim and source cited within it, you can generate a workflow: first, have one agent identify all factual claims, then launch a sub-agent for each claim to conduct a detailed verification. You can also include a verification agent to review the sub-agents responsible for sourcing, ensuring their sources meet a high standard of quality.

Sort

You may have a set of items you want to rank according to a qualitative metric, and you believe Claude Code excels at evaluating such metrics. For example, ranking support tickets by bug severity.

But if you try to sort over a thousand lines within a single prompt, quality declines and the context window cannot accommodate it. A better approach is to run a tournament mechanism, building a pipeline of pairwise comparison agents, since comparative judgments are typically more reliable than absolute scoring; or first perform parallel bucket sorting and then merge the results. Each comparison is performed by an independent agent, so a deterministic loop can maintain the tournament structure, with only the current sequence of operations needing to remain in context.

Memory and Rule Compliance

If you have a set of specific rules that Claude frequently misses or performs poorly on, even when those rules are present in CLAUDE.md, you can create a workflow that lists these rules and assigns a verification agent to check each one individually—each rule corresponds to one verification agent. Creating a sub-agent with a "skeptic" persona to review whether the rules themselves are reasonable can also help reduce false positives.

The reverse is also possible: mine your recent conversations and code review comments to identify recurring corrections you've made; have a parallel agent cluster these issues; then perform adversarial validation on each candidate rule to determine whether it truly prevents a real error; finally, refine the rules that pass the screening back into CLAUDE.md.

Root cause investigation

The most effective way to debug is to formulate several independent hypotheses and test them one by one. However, if you use only a single context window, Claude may fall into self-bias.

The workflow can structurally prevent this situation: it can initiate multiple agents to generate hypotheses based on non-overlapping evidence. For example, different agents can examine logs, files, and data separately. Each hypothesis can then be reviewed by a set of validators and refuters.

This isn't limited to code. Workflows can also be used for sales analysis, such as "Why did sales drop in March?"; for data engineering, such as "Why did this pipeline fail?"; or for any post-mortem review.

Mass triage

Every team has support queues, bug reports, or other backlogs that cannot be fully handled by humans. A triage workflow can categorize each item, deduplicate it against already tracked issues, and take action—whether that means attempting a fix or escalating it to a human for handling.

For the triage workflow, a useful pattern is quarantine. That is, prohibit agents that read untrusted public content from performing high-privilege operations; high-privilege operations should be carried out by agents specifically responsible for actions.

You can pair triage workflows with /loop to have Claude continuously perform these tasks.

Explore and taste judgment

Workflows are useful when you need to explore different paths to a solution, especially for design, naming, and other tasks involving aesthetic judgment, and when you can benefit from a set of evaluation criteria.

You can have Claude explore a wide range of options and provide the review agent with a set of criteria for what constitutes a "good solution." When the review agent determines that the results meet these criteria, the task is complete. Different solutions can also be ranked or filtered using this evaluation framework through a tournament-style mechanism.

Evaluations

You can run lightweight evals for specific tasks by launching an independent agent in the worktree, followed by a comparison agent that evaluates and scores the output based on predefined criteria. For example, you can assess and improve a skill you created to see if it meets certain specific standards.

Model and intelligent routing: You can create a classification agent tuned for your specific task to determine which model to use. This approach is useful when tasks involve extensive tool usage and conducting research beforehand helps identify the most suitable model.

For example, for the task "Explain how the auth module works," the most suitable model depends on the number of files in the auth module and the structure of the codebase. The classification agent can first conduct this research and then route the task to Sonnet or Opus based on the expected complexity.

When should you not use a dynamic workflow?

Workflows are still new. While they can deliver far greater results than conventional methods in many use cases, not every task requires them, and they may significantly increase token consumption.

Use workflows for tasks that can extend the boundaries of Claude Code’s capabilities in new ways. For routine programming tasks, ask yourself first: Does this task truly require more computational resources? For example, most traditional programming tasks don’t need a team of five reviewers.

Tips for building dynamic workflows

Prompt design

When writing prompts for dynamic workflows, the more detailed the information, the better the results—especially when using the specific techniques mentioned above.

Workflows are not only for large tasks. You can also prompt the model to use a "quick workflow." For example, you can create a rapid adversarial review process to check a hypothesis.

Use with /goal and /loop

When using reusable workflows, such as triage, research, or verification workflows, pair them with /loop to run them at fixed intervals, and use /goal to set strict completion requirements.

Token budget

You can set a clear token budget for dynamic workflows to limit the number of tokens consumed by tasks. You can specify a budget requirement in the prompt, such as “use 10k tokens,” which will set the upper limit to 10k tokens.

Save and share dynamic workflows

You can press 's' in the workflow menu to save workflows. You can submit them to ~/.claude/workflows or distribute them via skill.

To share them via skill, place the JavaScript workflow files in the skill folder and reference them in SKILL.md. For greater flexibility, you can also prompt Claude to treat the workflows within the skill as templates rather than scripts that must be executed verbatim.

A whole new world

Workflows are a useful new way to extend Claude Code. We encourage you to view this as a starting point—there’s still much to explore about how to use it effectively. We’d love to hear what you discover.

Thariq Shihipar and Sid Bidasaria (@sidbid) are members of Anthropic’s engineering team working on Claude Code.

[Original link]

Click to learn about the open positions at BlockBeats

Welcome to the official community of律动 BlockBeats:

Telegram subscription group: https://t.me/theblockbeats

Telegram group: https://t.me/BlockBeats_App

Twitter official account: https://twitter.com/BlockBeatsAsia