The teams from Tsinghua University and Sun Yat-sen University have open-sourced OpenRath, a PyTorch-like runtime framework for multi-agent, multi-session environments. To address the challenge of state management after scaling agent clusters, OpenRath introduces a session-centric design, treating sessions as evidence carriers rather than mere chat logs, and enabling lineage tracing through Session Graph. The framework implements core mappings such as Tensor→Session, Module→Workflow/Agent, and Device→Sandbox, supporting the MAMS (Multi-Agent, Multi-Session) architecture to allow multiple agents to share state, dynamically route tasks, and collaborate in parallel. Version v1.2.1 is now available on PyPI under the BSD-3-Clause open-source license.

Article author and source: AI New Era

Newzhizhuan reports

[New Intelligence Yuan导读] As the number of agents increases, management becomes chaotic. OpenRath proposes using Session as the core, replacing the agent-centric design, enabling multiple agents to share state and achieve clearer collaboration and control.

There are more and more agents, but the sessions are becoming increasingly disorganized.

This is the wall almost everyone hits after truly scaling up multi-agent systems.

One agent maintains a context, while another copies a history; a single task branches into multiple reasoning paths, leaving no one able to trace which branch produced the final answer; model calls, tool executions, sandbox environments, and long-term memory each manage their own states—demos run smoothly, but when the system scales to dozens or hundreds of agents, debugging, reproduction, and orchestration all spiral out of control.

Recently, a team from Tsinghua University and Sun Yat-sen University (Rath Team) open-sourced their solution called OpenRath: a multi-agent, multi-session runtime similar to PyTorch.

Its claim is: Stop revolving around Agents. The real first-class citizen should be the Session.

OpenRath v1.2.1 is now available on PyPI,install it with pip install openrathunder the BSD-3-Clause license, with full documentation, website, blog, and GitHub resources available.

This article will walk you from “why” to “how,” focusing on how OpenRath differs from frameworks like AutoGen and LangGraph—and why it dares to use PyTorch’s name.

The agent uses chat history for reasoning.

The first-principles question: When the Agent actually takes action, where should the evidence be stored?

The first-generation large model applications can be summarized as “prompt in, answer out.” Agent systems have transformed this boundary.

A useful agent does more than generate text—it retrieves information, plans, calls tools, reads files, writes code, queries APIs, runs tests, operates browsers, and sometimes even modifies external states. ReAct enables alternating between reasoning and action in a loop; Toolformer teaches models when to invoke tools; and the Model Context Protocol turns tools into protocol-level boundaries—this frontier continues to advance.

But once the Agent actually acts upon the world, a runtime-level question arises: where do the proofs of these actions reside?

If a tool call reads a file, we need its parameters and results; if it modifies a repository, we need the diff; if it runs inside a sandbox, we need the sandbox’s identity; if it fails and retries, we need the failed path; if someone approves or rejects an action, we need the verification signal. A chat log can at most describe these events, but it is insufficient to reconstruct them.

For example.

A software task: The research agent read the issue and retrieved notes; the coding agent modified the repository; the sandbox ran tests; the validation agent rejected the first patch, causing the workflow to branch; the memory backend logged this failure to prevent recurrence. If these events are scattered across individual logs, the final outcome becomes almost the least important product—the true value lies in the evidence chain showing how the work progressed step by step.

This is the starting point of OpenRath: treating Session as a vessel for evidence, not just chat history.

Why Agent Cluster?

A single agent would bloat into a massive prompt, so it needs to be split apart.

Initially, one agent was sufficient: receive input, understand the task, call tools, and return results—like an enhanced chatbot. But real-world tasks quickly exceeded the capabilities of a single agent.

A proper software engineering task often needs to be broken down into requirements understanding, research, architecture design, code implementation, testing and validation, and result review. Each stage requires different skills—some excel at planning, others at coding, and others at spotting errors. Continuing to have a single Agent handle everything will cause it to bloat into a massive prompt and an increasingly chaotic context window.

Thus comes Agent Cluster: where Planner, Researcher, Coder, Reviewer, Executor, and Memory Agent each fulfill their roles to collaborate toward a complex goal.

Multiple specialized agents collaborate around a shared session: each reads the current state, completes its partial task, and writes the results back for the next agent to continue.

But once you actually get it running, challenges emerge: How do these Agents share context? Which Agent, which branch, and which tool call produced a particular conclusion? If one Agent makes a mistake, can you roll back to the corresponding branch and try again?

In short, the real challenge of Agent Cluster has never been about creating more agents—it’s about managing how state flows between them.

OpenRath asked one more question

Others have solved how agents communicate with each other; it asks who owns the task after the conversation is finished.

The term "multi-agent" often brings to mind a group chat: one agent proposes, another criticizes, one executes, and a supervisor decides when to conclude. This pattern is useful, but not enough.

Significant work has already been done on this path: AutoGen has turned multi-agent conversations into a practical programming model; CrewAI has separated agent teams from more structured workflows; LangGraph uses graph states and supervisor nodes to express routing and control. All of them address how agents communicate with each other.

OpenRath then asked another question: After the agents finish speaking, who owns the state of this task?

A production-grade Agent Cluster must decide: which Agent should handle the current session, what context it should see, which memories it has read, in which sandbox the next command should run, and what validation signals are required before proceeding. These are all control plane concerns that cannot be solved by simply adding another role to a group chat. OpenRath’s solution is to make the Session the unit of routing, and the Session Graph the control plane—where Agents, tools, workflows, memories, and sandbox locations all converge.

An agent cluster is not a group chat, but a runtime control plane built on persistent session states.

This is why, when viewing the number of agents multiplied by the number of sessions, multi-agent systems are divided into four quadrants:

Single-agent single-session is ChatGPT-style chat; multi-agent single-session is sub-agent collaboration; single-agent multi-session is OpenClaw-style branching fan-out; and multi-agent multi-session (MAMS) is precisely the direction OpenRath is headed toward.

OpenRath calls this approach MAMS (Multi-Agent Multi-Session). Its judgment is straightforward: what truly needs to be forked, merged, reused, and tracked is the entire session data flow—not the individual message lists maintained within each agent.

Instead of gathering a room of smart workers, OpenRath first builds the workstations, task tickets, and production lines. In their own words: Agents are the workers, but the Session is the work itself.

Build an agent cluster like PyTorch

This is not name-dropping.

OpenRath adopted all three design features that make PyTorch so user-friendly.

OpenRath's smartest move was to transplant the entire set of abstractions most familiar to deep learning developers onto the Agent system.

Why is PyTorch so useful? Because it breaks complex computations into clear building blocks: Tensors are the flowing data, Modules/Layers are composable units that transform that data, devices determine where the computation happens, and the entire computational graph only emerges when it runs. OpenRath provides nearly a one-to-one mapping for Agent systems:

Core Mapping:Tensor → Session,Module/Linear → Workflow/Agent,Device → Sandbox / Backend,Parameter → Memory,Function → Tool,Control Flow → Selector.

The following three sections explain the three most critical mappings in this table, detailing why they were designed this way.

This mapping is not a gimmick. Breaking it down, what PyTorch truly teaches OpenRath are three things—the following three sections are precisely the three pillars for understanding OpenRath.

Pillar One: Agents are transformation layers, not all-in-one assistants

Layer does not hold data; Agent does not hold state.

In PyTorch,nn.Linearis not an application—it’s just a single transformation: it takes in a Tensor and outputs a Tensor. A network’s capability comes from stacking many such layers together.

OpenRath designed the Agent as the same thing. An Agent is a transformation layer on top of a Session. Its core is aforward(session) -> session pathway: a Session comes in, and a Session goes out.

The key point is that there is more than one type of transformation layer. Even with the same shape, forward(session) -> session, completely different tasks can be accommodated:

An agent calls tools, modifies files in the workspace, and writes the execution results back to the session.
ACompressorcompresses a long conversation that has gone through dozens of rounds into a single concise message (see Official Example Lesson 8);
Before an Agent runs,recallmemory, and after it runs,commitmemory—essentially creating an “index and archive” for this session;
You can also create an agent that only summarizes, only validates, or only rewrites.

They all expose the same interface, allowing them to be arbitrarily stacked and nested like layers in a neural network. This is precisely the purpose of Workflow (corresponding to nn.Module): subclasses only need to implement a single forward(session) -> session method, within which multiple Agents can be chained, Sessions can be forked, context can be compressed, tools can be invoked, and sub-workflows can be dispatched. Since every layer takes and returns a Session, Workflow can be layered like nn.Module, eliminating the need for each layer to reinvent its own state format.

Managing hundreds of agents shifted the focus from stringing together prompts to building modules. Layers do not hold data—data is Tensor; agents do not hold state—state is Session.

There’s also a hidden benefit that’s often overlooked: since the Agent doesn’t own the entire world—the Session loop remains the engine, the Sandbox remains the execution environment, and Memory remains an independent store—the scenario for a single Agent is simple enough that the same Agent can be seamlessly plugged into a larger workflow without requiring any changes.

As for the tool itself, OpenRath abstracts it into FlowToolCall: one hand holds the name, description, and JSON schema presented to the model, while the other holds the actual Python behavior executed at runtime, keeping "what the tool looks like" and "what the tool does" always together. Built-in tools for files, shell, and code execution, as well as stdio-based MCP tools, can be directly integrated into the same loop. Beneath it all is a clear layering: FlowToolCall is the function visible to the flow-layer model, while BackendTool* is the actual payload consumed by the sandbox backend.

Second pillar: Sandbox and Memory are "pluggable backends"

Hardcoding the backend is like welding the model to the CPU.

The second smart design of PyTorch is separating "where to compute" from "what to compute." With the same model code,.to("cuda")runs on GPU; switch the backend, and it runs on a different accelerator—all without changing a single line of computation logic. The device and compute backend are plug-and-play.

OpenRath applied this idea to the two most easily hardcoded areas: the execution environment and long-term memory.

Sandbox (corresponding to Device) — Where the tool is actually executed. Many frameworks separate the management of “conversation history” from the actual execution location of tools, causing the model to believe it’s still in a certain workspace, while the shell or container has already switched.

OpenRath binds Sandbox to the Session: the tool runs on the current backend of the Session, and the returned Session remembers its execution location and won't drift silently.

Its true innovation is making Sandbox a pluggable backend: the local process is always available (session.to("local", spec="./")), while the containerized OpenSandbox is optional (pip install "openrath[opensandbox]"); in the future, any third-party execution backend that connects to the same Session placement model will work. The execution environment is no longer hard-coded into a single shell.

Memory (corresponds to Parameter) — persistent memory retained across runs. It is a standalone layer of persistent state that can be bound to an Agent, recalled before a run, and committed after a run; it is neither discarded after use like tool outputs nor merely a few lines of text inserted into a prompt.

The basic installation comes with a dependency-free local backend that stores data in.openrath/memory/, enabling BM25 lexical retrieval without requiring an LLM;

With embedding enabled, you can use vector sorting; for stronger capabilities, you can integrate external memory services like OpenViking. Like Sandbox, Memory is a pluggable backend—recall is not locked to any single database.

This approach is especially friendly for teams with their own local state: if you already have your own container orchestration, vector library, or knowledge base, there’s no need to start from scratch—just wrap it as a backend and plug it in to reuse OpenRath’s full Session/Workflow abstraction. Ultimately, by making the execution environment and memory interchangeable components, OpenRath enables state, execution, memory, and orchestration to be decoupled enough to be replaced individually, while still being seamlessly connected through a single flowing value—the Session.

Third pillar: Session Graph is a dynamic graph

The graphs in PyTorch are generated at runtime, and so are OpenRath's Session Graphs.

PyTorch has a third addictive design: dynamic graphs (define-by-run). It doesn’t require you to define the entire computation graph upfront before feeding data—instead, the graph grows as your code executes. Control flow uses ordinary Python if/for statements, offering unmatched flexibility to change execution paths at runtime based on intermediate results.

The Session Graph of OpenRath is the same thing as a personality.

First, let’s see what a Session looks like. It’s much more than just a string of chat logs—it’s a structured table of chunks, roughly shaped like this:

Session
├─ chunks: [ {role: system, ...},          # Agent instructions are also one chunk
│            {role: user,      text: "..."},
│            {role: assistant, text: "..."},
│            {role: tool_result, name, args, result} ]  # Tool evidence
├─ placement: "local" / "opensandbox"         # Where this segment executes
├─ lineage:   parent / fork / merge relationships       # Which branch I came from
└─ usage:     token consumption

It can fork branches, detach from the parent chain, merge connections, and serialize into JSONL to be passed directly to the next workflow. This graph, woven together by fork and merge operations, is not a pre-defined script but emerges step by step as the agents run and evolve—that’s what “dynamic graph” means.

The session evolves along fork/merge steps, with lineage left at each step; the tool execution environment is bound by the Sandbox backend. This diagram is not pre-scripted but grows in real time as the Agent runs.

Why is this decisive for Agent Cluster? Because once the cluster scales up, you inevitably need to answer: Which agent reached this conclusion, which branch did it take, which tool did it call, and in which workspace was it produced? Dispersed logs can’t answer this—but a dynamic lineage graph can. Thus, the Session Graph evolves from an implementation detail into the cluster’s observability and control layer: routing, reproduction, rollback, and auditing all happen on the same graph.

Dozens of lines, get this system running.

No matter how much you explain in the abstract, nothing beats seeing a working piece of code.

This minimal example (taken from the official README's minimal complete workflow) connects Session, Sandbox, Tool, Agent, Memory, Workflow, and Compressor all at once:

from rath import flow
from rath.session import Session

class ReadmeWorkflow(flow.Workflow):
    def __init__(self):
        provider = flow.Provider(model="gpt-5.5")
        # Agent: a transformation layer with prompt / tools / memory
        self.agent = flow.Agent(
            "Use the `word_count` tool, then answer briefly.",
            provider, tools=[WordCountTool()], memory="local",
        )
        # Compressor: another transformation layer — compresses long conversations into one concise message
        self.compressor = flow.Compressor("Compress the run into one message.", provider)

    def forward(self, session: Session) -> Session:
        self.agent.remember_memory("The user likes compact summaries.")  # Write memory before execution
        session = self.agent(session)                                    # Apply first transformation
        self.agent.commit_memory(session)                                # Commit memory after execution
        return self.compressor(session)                                  # Apply second transformation

# Session holds data, Sandbox determines execution location
session = Session.from_user_message("Count the words in: OpenRath makes agent clusters traceable.")
session = session.to("local", spec="./")
out = ReadmeWorkflow()(session)

Read this code — all three pillars are right here: the data is the Session, the execution location is determined by.to() (pillar two),agent andcompressor are two layers of transformations stacked together (pillar one), and how they’re chained and how many layers are used is written in ordinary Python insideforward (pillar three). Every step in and out uses the same Session.

The truly dynamic location Selector

Hardcoding the process into the prompt is rigid; letting the Selector handle it teaches the system to adapt.

Theforward order in the previous section is hardcoded. But real-world tasks often can't determine the correct path until they're running—that's when dynamic graphs' "runtime routing" comes into play.

Many frameworks rigidly predefine the workflow: if, go to A; else, go to B.

OpenRath's answer is Selector: a router powered by a large model. It selects among several "self-describing" Workflows, returns the next Workflow to execute, and returns a no-op when the task is complete. The brilliance lies in how it keeps the if/while logic between Agents as ordinary Python:

selector = flow.Selector(provider)
while not isinstance(
    nxt := selector.forward(session, triage, tech, wrapup), flow.EmptyWorkflow
):
    session = nxt(session)

Writing the workflow into the prompt locks in uncertainty; entrusting it to the Selector is how the system learns to adapt. This is precisely why the official team calls OpenRath a “dynamic multi-agent workflow”: the workflow transforms from a rigid script into a route determined at runtime—just like the freedom in PyTorch’s dynamic graphs, where the graph grows wherever the code runs.

Can it be used right now?

Can hold, can run, can follow along—not just a PowerPoint.

The most telling is itsexample/ directory—a progressive learning path where each script covers just one concept, with the output of one serving as the input for the next:

Open source on GitHub under the BSD-3-Clause license; simplypip install openrath.

01_hello_agent: Minimal program to construct an Agent, invoke it on a Session, and output streamingly
02_session_lineage: Fork to create a branch, detach to sever lineage, view the session graph, export as JSONL
03_sandbox_backend: Place the same session in local or opensandbox to see where the tool executes
04_tools_builtin/05_custom_tool/06_mcp_tool: Built-in tools, custom tools, MCP tools
07_streaming/08_compress/09_memory/10_provider_variation: Streaming, Context Compression, Memory, Switching Model Providers
11_dynamic_selector: Use Selector for if branches and while loops

From “getting one agent up and running” to “enabling a group of agents to dynamically collaborate,” completing these 11 steps will give you a thorough understanding of OpenRath’s core.

Installation is layered: base pip install openrath, add container sandbox with [opensandbox], add external memory with [openviking], and configure models via OpenAI-compatible environment variables or ~/.openrath/config.json.

More importantly, consider the design philosophy behind this example: the official team emphasizes that the goal is to produce an “evidence dossier,” not just a screenshot. After a software task completes, the ideal outcome shouldn’t stop at a simple “success” message, but should be a complete, traceable record—

Session Graph → which tools were called → which sandbox experienced the side effects → which branch was rejected → the final patch adopted → test results → what was written to Memory this time.

This dossier is precisely the difference between "a demo" and "a runtime suitable for a technical report." According to the team themselves, they have already organized an Agent Workflow near the Transformer architecture internally using OpenRath—but this is more of a system capability validation than a public benchmark, and they are very upfront about this.

From persistent sessions to agent clusters

v1.1 makes the work of individual agents traceable, and v1.2 makes the collaboration among groups of agents traceable.

Broaden your perspective: The evolution of OpenRath is itself a clean trajectory. v1.1 addresses "persistence"—if an Agent’s work unfolds over time, why should only the final answer be saved? Thus came persistent Sessions, preserving complete evidence of the work done. v1.2 raises the bar further: transforming Sessions from "post-hoc records accessible to a single Agent" into "objects that can be routed across multiple Agents and workflows." A single line of code captures this shift:

session = workflow.forward(session)

It means that the unit of work has shifted from a single prompt, answer, or agent role to a persistent, routable session state.

From prompt engineering

Moving toward systems engineering

The significance of OpenRath is more than just "another agent framework."

What it truly aims to solve is: when Agent Cluster becomes the dominant paradigm, can developers gain a composable, traceable engineering experience similar to writing deep learning? This is precisely the confidence it draws from the name PyTorch—the same intuition has been transferred over: layers are transformations (Agents), devices are plugable (Sandbox / Memory backend), and graphs are dynamic (Session Graph).

In PyTorch, you define Modules to let Tensors flow through the network; in OpenRath, you define Agents and Workflows to let Sessions flow through the system. The rest—lineage tracking, tool scheduling, sandbox binding, long-term memory, dynamic routing—is handled by the framework.

If past Agent frameworks were designed for "an intelligent assistant," then OpenRath is designed for "an intelligent agent system."

And the starting point of this was surprisingly simple—not building another Agent, but taking Session seriously.

References:

https://www.openrath.com/

Edit: LRST

The Tsinghua and Sun Yat-sen University team introduces OpenRath, a multi-agent runtime framework.

[New Intelligence Yuan导读] As the number of agents increases, management becomes chaotic. OpenRath proposes using Session as the core, replacing the agent-centric design, enabling multiple agents to share state and achieve clearer collaboration and control.

From prompt engineering