How to Use Claude's Dynamic Workflows for In-Depth Research

Over the past three years, I’ve become dependent on AI to assist with industry research, and I’ve even built a series of skills and auxiliary systems to solve information filtering, synthesis, connection, verification, and retention.

Only after deeply experiencing Claude Code’s dynamic workflow this week did I truly understand the meaning of the phrase: “Don’t oppose the great era.”

Think again: What kind of in-depth research should humans undertake in the AI era, and how can I build a collaborative, complementary relationship with AI?

I. Starting with the pitfalls of research

Doing technical research is inherently full of pitfalls (for both humans and AI), as from the very beginning, you’re bombarded with vast amounts of information, leading to increasingly divergent opinions and increasingly unclear conclusions. That’s why it’s essential to constantly return to your original objective.

This has also been a longstanding limitation of AI: from the perspective of attention and association, it is more constrained by the volume of current information and has weak capabilities for truly valuable cross-domain associations.

Where AI excels is in its execution—it can systematically search, categorize, and summarize in layers, completely avoiding the loss of detail.

Although I haven’t published much on my public account over the past six months, I’ve been closely following and researching nearly all major battlegrounds in the industry, supported by my own deep-research system.

Meanwhile, with Claude Code launching the Dynamic Workflows feature last week, I wanted to pit them against each other to see if its default capabilities can fully surpass my own.

What is Dynamic Workflows?

Dynamic Workflows center on the idea that before executing a task, AI automatically designs the optimal workflow to accomplish it, and then initiates execution.

This is fundamentally different from the "planning mode" and "skill" we used before. Planning mode breaks tasks into finer steps, but not necessarily in alignment with a logical workflow. Only with properly structured prompts can you add validation criteria (which is crucial for Research), and similarly, only with prompts can it better predefine harness rules.

But dynamic workflows automatically incorporate acceptance logic, result convergence, and adversarial validation.

The trigger is simple: just use /deep-research in CC, then provide some research templates and starting materials. If you want to use the dynamic workflow feature separately, use the prompt or simply say "ultracode." Note that token consumption will be approximately dozens of times higher than usual.

Three: Six Built-in Workflow Modes

Beneath the dynamic workflow are six core scheduling patterns summarized by the official team, which is why it is stronger than ordinary chat/agent/skill systems.

In fact, behind these six modes are only two core questions: How to break down the task? And how to combine the results? Separating them into six modes is essentially a permutation and combination of these two.

3.1 Routing Mode (Classify-and-Act)

First, an agent identifies the task type, then routes the task to the most suitable specialized agent. The core logic lies in the routing decision, not in parallelism or iteration. Each task follows only one path, with all other paths left entirely unexecuted.

For example, I can start with three preset subagent roles: an analysis agent that strictly verifies data, an output agent skilled in writing, and a challenge agent specialized in finding vulnerabilities. The routing layer will determine which subagent is best suited for each subtask, rather than having one agent handle everything.

The value of this model lies in its precision and efficiency: each agent’s prompt can be highly independent, unaffected by other objectives, enabling deep, vertical exploration. It consumes the fewest tokens and delivers the fastest response times, with clearly defined responsibilities.

The drawbacks are also significant, as it has weak capabilities in handling ambiguous tasks (such as "both a technical issue and an account issue").

3.2 Fan-out & Merge

It's also my most commonly used pattern, with the core logic being parallel execution plus merging. The task is split into N independent subtasks that run simultaneously, and then merged together once all are completed.

The advantages lie in speed and isolation. The total time is approximately equal to that of the slowest subtask, not the sum of all subtasks. Each subtask has its own independent context, does not interfere with others, and is not affected by noise from any other subtask.

The weakness is that the token cost is multiplied by N in series, and the synthesis layer itself is also challenging—fusing N outputs with inconsistent structures is a design challenge. Poorly defined subtasks can lead to omissions or overlapping coverage.

3.3 Adversarial Verification

The core logic is to test by having multiple agents challenge the same conclusion from a counterargument perspective; it only passes if a majority vote agrees.

The advantage is that, since the Verifier does not know the Worker’s reasoning and only evaluates the outcome, this structure inherently eliminates the self-evaluation bias that occurs when a model checks its own code.

This approach solves a long-standing issue for me: we often speak to AI in casual language, but AI tends to align with our expectations, leading to confirmation bias. By using adversarial validation, we force AI to seek counterexamples and verify based on data and experiments, rather than catering to our assumptions.

However, if the verifier makes an incorrect judgment, it could mislead the worker into catering to the verifier. Therefore, it is preferable to base decisions on reproducible facts rather than opinions.

Just kidding—if you ask an AI to find problems, it can identify them endlessly, so you need to define the boundaries of what problems it should look for.

3.4 Generate & Filter

The core logic is divergence followed by convergence. First, deliberately generate an excess of candidates, then use a rubric to eliminate all but the finest, retaining only high-confidence results for output.

Rather than having an agent output a single "okay" answer, it's better to generate ten and then use a verification layer to filter them. The advantage lies in diversity: multiple generators can employ different strategies and prompts to produce solutions that are hard for humans to anticipate, while the filtering step ensures the final output is highly refined.

The weakness is that the quality of Filter's rubric directly determines the final outcome; an incorrectly designed rubric renders the entire process ineffective.

Suitable scenarios include situations where the correct answer is unknown beforehand, requiring selection among multiple possibilities, and where diversity is explicitly needed.

It only appears similar to Fanout-And-Synthesize: both follow a "multi-path parallel → single output" structure, making them the easiest to confuse.

The key difference lies in intent: each branch of Fanout handles a different part of the task, with results that are complementary—when merged, all branches contribute; each branch of Generate-And-Filter handles the same task, with results that are competitive—when merged, most are discarded. The former is a "puzzle," the latter is a "beauty contest."

3.5 Tournament Mode

The core logic is competition and elimination. N agents independently perform the same task, and through pairwise comparisons over multiple rounds, the least effective ones are eliminated until the optimal solution is selected.

I used to do this manually—running two or three versions of the same code change and having AI compare which one was better. Now it can be directly integrated into the workflow.

The advantage lies in evaluating stability. Pairwise comparisons ("Which is better, A or B?") are significantly more stable than absolute scoring ("Score A"), as they eliminate the issue of scoring criterion drift. The results, refined through multiple rounds of competition, yield highly credible winners.

It also appears similar to Generate-And-Filter: both select the best option from multiple candidates. The key difference lies in the selection mechanism: Tournament uses pairwise comparison to pit candidates against each other, making it more reliable when the rubric is hard to quantify and judgment is inherently relative.

3.6 Loop Mode

The core logic is adaptive iteration: continuously attempt, collect error information when encountering obstacles, enrich the context, and retry until the acceptance criteria are met.

At its core, it’s about countering AI’s randomness: try multiple times, and you’ll eventually stumble upon a better result. But a more refined approach is to combine it with adversarial validation, so each iteration executes with more informed guidance rather than relying purely on chance.

The advantage lies in its ability to handle tasks with unknown workloads. The other five modes all assume that task boundaries are defined, while Loop Until Done is the only mode capable of handling situations where the number of iterations is unknown.

A weakness is the potential risk of uncontrolled behavior—poorly designed stopping conditions can lead to infinite loops. Each round’s agent operates with a fresh context and cannot accumulate state across rounds (unless explicitly written to a file).

Four: My Skills vs. the Official Workflow

Before the dynamic workflow was released, I specifically designed my own deep-research system. The logic behind my skill set was something like this:

Just provide a simple update (e.g., a project has launched a new feature).
Have AI search for all relevant materials: official documentation, source code, and market sentiment.
Compress the information into a meaningful summary.
Multiple agent roles conduct adversarial analysis and generate a report.
Automatic deduplication due to high content repetition among multiple agents

I've been using it for a while, and I find it quite useful. However, it has a fundamental flaw: a lack of goal-oriented convergence.

Moreover, even with deduplication in step five, valuable information is often removed; without deduplication, you’re likely to receive a lengthy ten-thousand-word article filled with details but without clearly telling you, “What does this mean for you? What should you do?”

However, research is meant to serve "decision-making," which is why many skills remain stuck at the research stage—achieving 80 points but missing the most critical 20.

Even after initially completing the research, AI still needs ten more rounds of thinking and dialogue to reach a satisfactory and comprehensive conclusion.

What additional steps does the official dynamic workflow perform?

Through experiments with several complex research tasks this week, I found that Claude Code’s built-in deep research workflow (note: not just a skill, but a module compiled and embedded into CC) includes several key additional steps compared to my own skills:

Problem decomposition layer: It doesn’t start searching immediately; instead, it first asks questions to break my issue into multiple sub-questions: What are you truly trying to understand? How is this related to you? Which dimensions deserve deeper exploration? I used to skip this step.
Trustworthiness assessment: Evaluate each piece of information for falsifiability, similar to authority scoring in traditional SEO—how credible is the source? How many citations does it have? This is a step I hadn’t previously considered adding.
Cross-deletion instead of average merging: Previously, I averaged all conclusions, resulting in very large documents. The dynamic workflow performs multi-agent voting on each conclusion and removes those that don’t receive enough votes, rather than simply merging them.
The output should be goal-oriented: the final report is not a pile of information, but rather a judgment and recommended solution centered on your original objective. The key to achieving this lies in leveraging the preset capabilities of multiple sub-agents. Previously, my skills often lacked goal orientation because instruction weight diminished after processing vast amounts of information.

What problems do these mechanisms solve?

Targets several typical issues with AI performing long tasks:

Goal drift: The task starts off well, loses focus in the middle, then regains momentum by the end—similar to a person zoning out during a class. The longer the task, the more pronounced this becomes.

Premature stopping: While running, the AI encounters difficulties and thinks it has "completed" the task, when in fact the acceptance criteria have not been met.

Context pollution: When a single agent performs complex tasks, excessive prior prompts compress the available space for subsequent execution. A better approach is to limit prior prompts to a few KB and distribute the context across multiple agents.

Output bias: AI tends to align with your expectations; conversational phrasing is more likely to trigger this issue.

Dynamic workflows address these four issues in a structured way: automatically incorporating evaluation metrics to prevent premature termination; parallelizing and isolating contexts; countering verification-induced output bias; and decomposing problems with layered constraints so that AI first understands the goal before taking action.

V. Summary

Finally, as a long-term researcher, I am awestruck by this new CC mechanism, whose six built-in modes—routing selection, splitting and merging, adversarial verification, generation filtering, tournament competition, and loop cycling—cover the scheduling needs of the vast majority of complex research tasks.

I no longer need to manually design agent scheduling or handle deduplication and cross-validation myself—these are now built into the workflow itself.

He is particularly well-suited for reasoning in contexts with limited information and open-ended questions, as his innate multi-agent scheduling and task decomposition capabilities further enhance his generality. In fact, AI as early as three years ago already performed well on tightly constrained problems with clearly defined objectives. However, the true qualitative leap in AI lies in its generality—transitioning from simple code to becoming a true Agent, evolving from rigidly solving a single problem to adapting to any problem.

So Dynamic Workflows are not "smarter one-time conversations," but rather structure the research process itself.

I originally needed to conduct ten or more separate conversations for research, but now it’s been reduced to 3–4. Although the corresponding token consumption has increased by dozens of times.

Why are 3-4 more needed? I believe the root cause lies in the differences among these requirements.

First is the rigor of the verification mechanism. I primarily research new technologies on blockchains, where official documentation is often outdated, and more reliable references include open-source code, on-chain transactions, and other data. Currently, AI still defaults to official documentation rather than fact-based verification.

Second is deep thinking that transcends boundaries. Although some of this can be addressed through workflow presets (predefining sub-agents across various dimensions to analyze the same issue), AI still excels primarily in mainstream thinking models and is somewhat limited when it comes to highly novel, profound insights lacking sufficient data support.

Third is solution design and validation: the value of a solution lies not in its proposal, but in its validation and support, which relies on measuring existing mechanisms, inputs, and costs. While fine-tuning AI could yield better results, this would contradict its generality.

Finally, there is extreme information condensation, which requires understanding your audience’s level of knowledge—some have no background and need anthropomorphic explanations, while others need just one sentence to be convinced.