New Book on Agentic Design Patterns Reshapes Understanding of AI Agents

Author: Yanhua

Antonio Gullí is the Director of Engineering at Google. He wrote a 453-page book that breaks down AI Agent development into 21 design patterns.

But this is not a book review. My motivation for reading this book was specific: I’ve written about Harness Engineering, about my experiences with Clawdbot, and about “AI agents aren’t magic”—the seven turning points from burning tokens to making them truly useful. After each piece, there was a question I hadn’t fully resolved: Is there a reusable underlying logic behind all of this?

This book gave me the answers, and even deeper than I expected.

What you wrote might not be an Agent at all

The most ruthless judgment in the book is hidden in the prologue.

Most people are using “AI” at Level 0: a bare LLM, with no tools, no memory, and no ability to act. You ask it which film will win Best Picture at the 2025 Oscars, and it guesses. The book is clear: Level 0 systems are not Agents.

Going up is the real Agent:

Level 1: Tool User
The agent is now using tools: search, APIs, databases. But it’s not just about “being able to call interfaces”—it must also independently decide when to call them, what to call, and how to use the results. The book provides a concrete example: when a user asks, “What new shows are out recently?”, the agent recognizes that this information isn’t in its training data and proactively initiates a search tool, then synthesizes the outcome. The critical step is “recognizing on its own.” It’s not humans telling it, “Go search,” but the agent autonomously determining that a search is needed. This ability to judge is the threshold for Level 1.
Level 2: Strategic Thinker
Two additional things: planning and Context Engineering. The book defines Context Engineering as not accumulating information, but rather carefully selecting, trimming, and packaging context. The example is brilliant: a user wants to find a coffee shop between two locations. The Agent first calls a mapping tool to retrieve a large set of data, then determines on its own that “the next step only requires street names,” trimming the map output down to a short list before feeding it to a local search tool. Each step involves noise reduction in information.
There’s a line in the book that I read several times: “To achieve the highest accuracy from AI, you must provide it with short, focused, and powerful context.” Context Engineering is exactly what this is about.
At this level, the Agent can still reflect on itself. After completing a task, it reviews its own work and corrects any issues it finds. I’ll elaborate on this later.
Level 3: Multi-Agent Collaboration
The book’s stance is clear: don’t keep trying to build a single all-powerful super agent. The real reliable approach is to build a team—like a project manager agent, a researcher agent, a designer agent, and a copywriter agent. The book illustrates this with the example of a new product launch: a “project manager agent” coordinates everything and delegates tasks to the “market research agent,” “product design agent,” and “marketing agent.” The key is communication: how do agents exchange data, synchronize states, and resolve conflicts? This chapter presents six communication topologies, from the simplest single-agent model to the most flexible custom hybrid, with clear explanations of which scenarios each is best suited for.

After reviewing these four levels, I suddenly understand why many people say, "My Agent doesn't work well." The model isn't the issue—you're using it like a chatbot, and it might not even reach Level 1.

Context Engineering: The most underestimated concept in the book

I wrote a piece on Harness Engineering, arguing that track design is more important than engine horsepower. After reading this book, I realized that Context Engineering is the prompt-level counterpart of Harness Engineering.

Traditional Prompt Engineering only handles "how you ask." Context Engineering in the book handles "what the Agent sees before being asked." It includes four layers of information:

Layer one, system prompt. Define who the Agent is, what tone to use, and what boundaries to observe. Most people only write this layer.
Layer two, external data: documents retrieved by RAG, responses from tool calls, real-time API data. This is where most people get stuck: they know they need to feed data, but they don’t know how to do it without overwhelming the model.
Layer three: implicit data. User identity, interaction history, environmental state. Things you don’t explicitly state but the Agent should know. For example, if you tell the Agent, “Help me send John an email to confirm tomorrow’s meeting,” it should know what meeting you have scheduled for tomorrow in your calendar and your relationship with John.
Layer four: feedback loop. After each output, the agent automatically evaluates quality and adjusts its context strategy for the next interaction. The book refers to this as “automated context optimization,” and Google’s Vertex AI Prompt Optimizer is an engineering implementation of this concept.

When I read this, I remembered my earlier article, “AI Agents Are Not Magic,” which included the insight that “your agent needs rules—lots of them.” Looking back, those rules were essentially the manual version of Context Engineering, which this book systematizes.

Reflection: Two agents are truly better than one

This is the most practically valuable pattern in the book for me.

The core of Reflection is simple: after completing a task, the Agent reviews its own work and fixes any issues it finds. But the implementation requires care. The book clearly states: Producer and Critic must be two separate Agents, each with distinct system prompts. A single persona reviewing its own output will inevitably have blind spots. If you ask the same LLM to write code and then review that same code, it will likely say, “It looks good.”

The book provides a complete code example.

The producer's prompt is: "You are a Python developer, write a function to calculate the factorial, handling edge cases and exceptions."
The critic's prompt is: "You are a meticulous senior engineer who reviews code line by line, checking for bugs, style issues, overlooked edge cases, and areas for improvement. If it is perfect, output CODE_IS_PERFECT; otherwise, list all issues."
Then comes a for loop: Producer writes code → Critic reviews → Producer revises based on feedback → Critic reviews again → until Critic says CODE_IS_PERFECT or the maximum number of iterations is reached.

It’s that simple. But the book warns about a commonly overlooked cost: each reflection cycle is a new LLM call, and more iterations mean higher costs. Additionally, as the conversation history grows, the context window becomes filled with earlier versions and critiques, reducing the actual available reasoning space. Therefore, the best practice for Reflection is to set a reasonable maximum number of iterations (the book uses 3), and stop as soon as the Critic is satisfied—don’t chase perfection.

Its applications go far beyond coding—writing articles, making plans, summarizing documents, solving logic puzzles—all can be handled by the Producer-Critic model. The book lists seven use cases, all sharing the same core logic: generate first, review next, then refine.

Multi-Agent is not necessarily better when more complex

The part of this chapter I like most is the six communication topologies. Many people jump straight into complexity, but in reality, most scenarios only need three:

Single Agent (Independent Execution): The task can be broken down into independent subproblems, each handled by its own Agent. Simple and easy to maintain.
Peer-to-Peer: Agents communicate directly without a central control node. Decentralized and fault-tolerant—when one agent fails, it does not affect the overall system. However, coordination costs are high and it can become disorganized.
Supervisor (central orchestration): A Supervisor Agent manages a group of Worker Agents, assigning tasks, collecting results, and resolving conflicts. It offers clear hierarchy and easy management, but it becomes a single point of failure and a performance bottleneck.

The other three (Supervisor-as-Tool, Hierarchical, and Custom Hybrid) are variations and combinations of the first three. The book puts it plainly: the topology you need depends on the complexity of your task. As you break tasks into smaller pieces, communication costs rise—beyond a certain point, the Supervisor pattern becomes more efficient than the hierarchical approach.

My experience is that many people building Multi-Agent systems spend 80% of their time on communication protocols, forgetting to ask a more fundamental question: Does this task really require multiple agents? The book clearly states that a Level 2 single agent with reflection is often sufficient. Level 3 is designed for scenarios where a single agent truly cannot handle the task.

The three-layer memory model—I had a vague sense of it before but didn’t name it.

The Memory chapter resonates with me the most, because while writing my two articles on Obsidian + Claude, I kept pondering: How should an agent’s memory be layered?

The book provides the answer:

Session: The context window for the current conversation, representing the shortest memory, which is lost once the conversation ends. Long-context models merely expand this window, but it remains fundamentally temporary, and each inference requires processing the entire window, making it costly and slow.
State: Temporary data during the current task, such as “what task is currently being performed,” “how far along it has progressed,” and “what intermediate data has been generated.” It lasts longer than a Session but is cleared once the task ends. The book provides a complete example using Google ADK’s State mechanism.
Memory (persistent layer): Long-term memory that spans sessions and tasks. User preferences, learned experiences, and important historical decisions are stored in databases or vector libraries with semantic retrieval. The book emphasizes a crucial point: Memory is not just about storing data—it requires a comprehensive strategy for deciding what to store, when to store it, and how to retrieve it. Storing too much creates noise; storing too little leaves you underinformed.

In my previous article about Clawdbot, I mentioned "state files" and "workspace documents," which essentially amounted to manually building the State and Memory layers; the book formalizes this concept.

Five assumptions, the fifth being the most absurd

The book concludes with five hypotheses about the future of Agents. The first four remain within the realm of reasonable speculation: general-purpose Agents progressing from writing code to managing projects, deeply personalized Agents proactively identifying your needs, embodied intelligence stepping off screens into the physical world, and Agents becoming independent economic entities.

The fifth one blew me away: Morphing Multi-Agent.

You only state the goal, such as “start an e-commerce business selling premium coffee.” The system automatically decides to first create a “Market Research Agent” and a “Branding Agent.” After running one round of data, it determines the Branding Agent is no longer needed and splits it into three new agents: “Logo Design Agent,” “Website Building Agent,” and “Supply Chain Agent.” If the Website Building Agent becomes a bottleneck, the system automatically duplicates three parallel agents to work on different pages simultaneously. Throughout the entire process, the system continuously and automatically optimizes each agent’s prompt and restructures the team architecture.

The book calls this a "goal-driven, self-transforming multi-agent system." It doesn't execute the plan you wrote—it generates its own plan, adjusts it, and reorganizes its execution team.

This reminds me of Karpathy's AutoResearch: write a program.md defining goals, metrics, and boundaries, then hit “launch.” Humans stay outside the loop. But this book goes further: even how to form and restructure the Agent team is left to the system itself. Humans only declare “what” they want.

Three things you can do right away

After reading this book, I have three immediate actions I can take:

First, add a Critic to your current Agent. Whether you're using Claude Code, CrewAI, or your own framework, add one final step to your existing workflow: have another Agent (with a different system prompt) review the output from the previous step. Code generation followed by code review, article writing followed by fact-checking, planning followed by feasibility assessment. One additional LLM call, but quality often doubles. The Producer-Critic pattern in the book is plug-and-play.
Second, start doing Context Engineering, not just Prompt Engineering. Look back at your instruction file for the Agent. If it contains only rules like “What should you do?” without context about “What environment are you currently in?”, add it. Tell the Agent which project it’s in, what decisions it has made previously, and what the user’s preferences are. The chapter on Context Engineering in the book and your AGENTS.md are two expressions of the same thing.
Third, don’t rush into Multi-Agent systems. First, get your single Agent to Level 2: equipped with tools, reflection, and memory. The book repeatedly emphasizes that a Level 2 single Agent, combined with Producer-Critic and Context Engineering, can cover the vast majority of real-world scenarios. Level 3 is designed for truly cross-domain, multi-stage tasks requiring parallel delegation. Most people’s problem isn’t having too few Agents—it’s that they haven’t properly tuned even one.

This book is 453 pages, published by Springer in 2025. Code examples cover LangChain/LangGraph, Google ADK, CrewAI, and the OpenAI API. The foreword is written by the VP of Google Cloud AI, and it includes a foreword from the CIO of Goldman Sachs—surprisingly engaging.

But the reason I recommend it isn’t “comprehensive.” After reading it, you’ll realize one thing: all the pitfalls you’ve encountered with Agents over the past six months have already been documented as patterns. You no longer need to reinvent Reflection, guess how to layer Memory, or experiment with which communication topology to use for Multi-Agent systems.

Someone has drawn the map for you; all that’s left is to walk.

Are you developing with an AI Agent? What level is your current Agent at?