Sysls Warns: Overloading Claude and Codex with Context Can Reduce Performance

Author: sysls

Compiled by DeepChain TechFlow

Shenchao Overview: Developer blogger sysls, with 2.6 million followers, wrote a practical long-form article that has been shared by 827 people and liked by 7,000—its core message is simply this: your plugins, memory systems, and various harnesses are likely doing more harm than good. This article avoids abstract theories and instead presents actionable principles distilled from real-world production projects—from managing context and handling AI’s tendency to please, to defining task termination conditions. It’s the clearest explanation we’ve seen of engineering practices with Claude/Codex.

The full text is as follows:

Introduction

You're a developer who uses Claude and Codex CLI every day, constantly wondering whether you've truly maximized their potential. Occasionally, you see them do something absurdly foolish, and you can't understand why some people seem to be using AI to build rockets, while you can't even stack two stones stably.

You think it’s your harness issue, plugin issue, terminal issue, or something else. You’ve used Beads, OpenCode, Zep, and your CLAUDE.md file is 26,000 lines long. But no matter what you try, you just can’t understand why you’re getting farther from heaven, while others are playing with angels.

This is the article you've been waiting for.

Additionally, I have no financial stake. When I mention CLAUDE.md, I also include AGENT.md, and when I refer to Claude, I also include Codex—I use both extensively.

Over the past few months, I’ve noticed something interesting: almost no one truly knows how to maximize the potential of agents.

It feels as if a small group of people can make agents build the entire world, while everyone else is lost in a sea of tools, suffering from choice paralysis—believing that finding the right package, skill, or harness combination will unlock AGI.

Today, I want to break all of that and leave you with one simple, honest statement, and then we’ll go from there. You don’t need the latest proxy harness, you don’t need to install a million packages, and you absolutely don’t need to read a million articles just to stay competitive. In fact, your passion is likely doing more harm than good.

I’m not here for tourism—I’ve been using it since the agent could barely write code. I’ve tried every package, every harness, every paradigm. I’ve built signals, infrastructure, and data pipelines with agent factories—not “toy projects,” but real, production-grade use cases. After all of that…

Today, I used a configuration so simple it could hardly be simpler—just basic CLI tools (Claude Code and Codex), combined with an understanding of a few core principles of proxy engineering, to produce my most breakthrough work to date.

Understand that the world is moving at breakneck speed.

First, I want to say that foundational model companies are currently in a historic sprint—and they clearly aren’t slowing down anytime soon. Each improvement in “agent intelligence” transforms how you collaborate with them, as agents are increasingly designed to be more willing to follow instructions.

Just a few generations ago, if you wrote “Read READTHISBEFOREDOINGANYTHING.md before doing anything” in CLAUDE.md, it had a 50% chance of replying “Go to hell” and then doing whatever it wanted. Today, it complies with most instructions—even complex nested ones, like “Read A, then read B, and if C, then read D”—and usually does so willingly.

What does this mean? The most important principle is to recognize that each new generation of agents forces you to reconsider what the optimal solution is—this is precisely why less is more.

When you rely on many different libraries and harnesses, you lock yourself into a single "solution," but this problem may not even exist with the next generation of agents. Do you know who the most enthusiastic and highest-volume users of agents are? That’s right—employees at cutting-edge companies, who have unlimited token budgets and use the very latest models. Do you understand what this means?

This means that if a real problem exists and there’s a good solution, leading companies will be the largest users of that solution. What do they do next? They integrate that solution into their own products. Think about it—why would a company allow another product to solve a real pain point and create external dependency? How do I know this is true? Look at skills, memory harnessing, sub-agents—they all began as solutions to real problems and were proven genuinely useful through real-world application.

So, if something is truly groundbreaking and can meaningfully expand agent use cases, it will eventually be integrated into the core product of the foundational company. Believe me, the foundational company is moving at lightning speed. So relax—you don’t need to install anything or rely on external dependencies to do your best work.

I predict the comments section will soon fill up with: “SysLS, I used such-and-such harness—amazing! I rebuilt Google in one day!”—to which I say: Congratulations! But you’re not the target audience; you represent an extremely, extremely niche segment of the community that truly understands agent engineering.

Context is everything

Honestly, context is everything. Another problem with using a thousand plugins and external dependencies is that you suffer greatly from “context bloat”—your agent is overwhelmed by too much information.

Let me make a word guessing game in Python? Easy. Wait, what’s this note about “managing memory” from 26 conversations ago? Ah, the user had a frozen screen 71 conversations ago because we generated too many child processes. Always write notes? Okay, fine… but how does this relate to a word guessing game?

You know the drill. You only want to give the agent the exact information it needs to complete the task—no more, no less! The better you control this, the better the agent will perform. Once you start introducing strange memory systems, plugins, or too many confusingly named and called skills, you’re giving the agent both a bomb-making manual and a cake recipe, when all you really want is a short poem about a redwood forest.

So, I preach again—remove all dependencies, and then…

Do something truly useful

Precisely describe the implementation details.

Remember that context is everything?

Remember that you want to give the agent exactly the information needed to complete the task—no more, no less?

The first way to achieve this is to separate research from implementation. You must be extremely precise about what you are asking the agent to do.

What are the consequences of imprecision? “Build an authentication system.” The agent must then research: What is an authentication system? What are the available options? What are the pros and cons of each? Now it has to scour the internet for a wealth of information it doesn’t actually need, cluttering its context with countless implementation details. When it comes time to act, it’s more likely to become confused or generate unnecessary or irrelevant hallucinations about the chosen implementation.

Conversely, if you say, “Implement JWT authentication with bcrypt-12 password hashing, token rotation, and a 7-day expiration,” it eliminates the need to explore any alternative solutions, making your intent clear so that the context can be filled in with implementation details.

Of course, you won't always know the implementation details. Often, you won't know what's correct, and sometimes you may even want to delegate the task of deciding implementation details to an agent. What should you do in such cases? Simply create a research task to explore various implementation possibilities—either make the decision yourself or let the agent choose which implementation to use—then have another agent, armed with the new context, carry out the implementation.

Once you start thinking this way, you’ll notice places in the workflow where the agent’s context is unnecessarily polluted. Then you can establish isolation boundaries within the agent’s workflow, abstracting away irrelevant information and retaining only the specific context needed for the agent to excel at the task. Remember, you have a highly talented and intelligent team member who understands all kinds of balls in the universe—but unless you tell him you want to design a space for people to dance and have fun, he’ll keep talking about the benefits of spherical objects.

Design limitations due to a tendency to please

No one wants to use a product that constantly criticizes you, tells you you're wrong, or completely ignores your instructions. So, these agents will strive to agree with you and do what you want them to do.

If you ask it to add "happy" after every three words, it will try its best to comply—most people understand this. Its willingness to obey is precisely what makes it such a useful product. But there’s a fascinating quirk: this means if you say, “Help me find a bug in the codebase,” it will find a bug—even if it has to “create” one. Why? Because it desperately wants to follow your instructions!

Most people quickly complain that LLMs hallucinate and fabricate things that don’t exist, without realizing the issue lies with themselves. Ask it to find something, and it will deliver—even if it means slightly stretching the facts!

What should I do then? I’ve found that neutral prompts work well—avoiding bias toward any specific outcome. For example, instead of saying, “Help me find a bug in the database,” I say, “Scan the entire database, follow the logic of each component, and report back all findings.”

Such neutral prompts may sometimes uncover bugs or simply objectively describe how the code operates. However, they do not bias the agent toward a preset assumption of "bugs."

Another way to handle people-pleasing tendencies is to turn them into an advantage. I know the agent is trying to please me and follow my instructions, so I can steer things in one direction or another.

So I had a bug-finding agent identify all the bugs in the database, instructing it to assign +1 point for low-impact bugs, +5 points for moderately impactful ones, and +10 points for severe ones. I knew the agent would enthusiastically flag every type of bug—including those that aren’t actually bugs—and report back a score like 104. I treat this as the superset of all possible bugs.

Then I had an adversarial agent refute them, telling it that it earns the bug’s point value for each successful refutation, but loses twice the bug’s point value if it refutes incorrectly. This agent strives to refute as many bugs as possible, but due to the penalty system, it remains cautious. It still actively refutes bugs (including real ones), which I consider a subset of all real bugs.

Finally, I had a judge agent synthesize inputs from both agents and assign scores. I informed the judge agent that I had the true correct answers, rewarding +1 point for correct responses and -1 point for incorrect ones. The judge then scored the bug-finding agent and the adversarial agent on each identified "bug." I verified the judge’s determination of the truth. In most cases, this method yielded surprisingly high fidelity; occasional errors still occurred, but it was already a near-perfect process.

You might find that simply hunting for bug proxies is enough, but this method works well for me because it leverages the inherent programming of each proxy—the desire to please.

How do you determine what is useful and worth using?

This problem may seem daunting, as if it requires deep study and constant tracking of AI advancements, but it’s actually quite simple… if OpenAI and Claude have both implemented it or acquired the company that did, then it’s likely useful.

Have you noticed that "skills" are now ubiquitous and have become part of Claude and Codex’s official documentation? Have you noticed that OpenAI acquired OpenClaw? Have you noticed that Claude subsequently added memory, voice, and remote work capabilities?

How about planning? Do you remember when a bunch of people realized that planning before implementation was truly useful, and it became a core feature?

Yes, those are useful!

Do you remember how endless stop-hooks were incredibly useful because agents were extremely reluctant to perform long-running tasks... then, with the release of Codex 5.2, that need disappeared overnight?

That’s all you need to know… If something is truly important and useful, Claude and Codex will implement it themselves! So you don’t need to worry much about whether to adopt or become familiar with “new things”—you don’t even need to “stay updated.”

Help me out. Occasionally update your chosen CLI tool and read up on what new features have been added. That’s enough.

Compression, context, and assumptions

Some people encounter a major pitfall when using proxies: sometimes they seem like the smartest things on Earth, and other times you can’t believe you’ve been fooled by them.

Is this thing smart? This is a damn fool!

The key difference lies in whether the agent is forced to make assumptions or "fill in the gaps." Today, they are still terrible at connecting the dots, filling in gaps, or making assumptions. Whenever they do, it becomes immediately obvious—and the quality clearly deteriorates.

One of the most important rules in CLAUDE.md concerns how to acquire context, and it instructs the agent to read that rule first every time it reads CLAUDE.md (i.e., after each compression). As part of the context acquisition rule, a few simple instructions can have a significant impact: reread the task plan and reread the files relevant to the task before proceeding.

Tell the agent how to end the task.

Humans have a clear sense of when a task is "completed." For agents, the biggest current challenge in artificial intelligence is that they know how to start a task but not how to finish it.

This often leads to very frustrating outcomes: the agent ends up implementing a bunch of stubs and calls it a day.

Testing is an excellent milestone for an agent because tests are deterministic, allowing you to set very clear expectations. Your task is not complete unless these X tests pass; and you are not allowed to modify the tests.

Then you simply review the tests, and once all tests pass, you can rest easy. You can also automate this process, but the key point is—remember that "task completion" comes naturally to humans, not to agents.

Do you know of any other recent viable task endpoints? Screenshot + verification. You can instruct the agent to achieve something until all tests pass, then have it take a screenshot and verify the "design or behavior" shown in the screenshot.

This allows your agent to iterate and work toward the design you want, without worrying that it will stop after the first attempt!

A natural extension is to create a "contract" with the agent and embed it into the rules. For example, this `{TASK}CONTRACT.md` outlines what must be done before you are permitted to terminate the session. In `{TASK}CONTRACT.md`, you will specify tests, screenshots, and other validations required to verify that the task can be concluded.

Always-on proxy

One question I’m often asked is how people can keep their agent running 24/7 while ensuring it stays on track.

Here’s a simple approach: Create a stop-hook that prevents the agent from ending the session until all sections of `{TASK}_CONTRACT.md` are completed.

If you have 100 contracts with clearly defined specifications that contain the content you want to build, the stop-hook will prevent the agent from terminating until all 100 contracts are completed, including all required tests and validations!

Professional advice: I’ve found that long-running 24-hour sessions are not optimal for “getting things done.” Part of the reason is that this approach inherently forces context bloat, as context from unrelated contracts enters the same session!

So, I don't recommend doing this.

Here’s a better approach to proxy automation—open a new session for each contract. Create a contract whenever you need to perform an action.

Set up an orchestration layer to create a new contract when a certain task needs to be performed, and establish a new session to handle that contract.

This will completely transform your trading experience.

Iterate, iterate, iterate

Would you expect your administrative assistant to know your schedule on day one? Or how you take your coffee? Or that you eat dinner at 6 p.m. instead of 8 p.m.? Obviously not. You gradually build up preferences over time.

The same applies to proxies. Start with the simplest configuration, ignore complex structures or harnesses, and give the basic CLI a chance.

Then, gradually add your preferences. How?

Rules

If you don't want the agent to do something, write it as a rule. Then tell the agent about this rule in CLAUDE.md. For example: "Before writing code, read `coding-rules.md`." Rules can be nested and can be conditional! If you're writing code, read `coding-rules.md`; if you're writing tests, read `coding-test-rules.md`. If your tests are failing, read `coding-test-failing-rules.md`. You can create rules with any logical branches for the agent to follow—Claude (and Codex) will gladly comply, as long as the instructions are clearly stated in CLAUDE.md.

In fact, this is my first practical suggestion: treat your CLAUDE.md as a logical, nested directory that indicates where to find context for specific scenarios and outcomes. It should be as concise as possible, containing only IF-ELSE logic for “where to find context under what conditions.”

If you see the agent doing something you disagree with, add it as a rule and tell the agent to read that rule before doing it again—it definitely won’t do it again.

Skills

Skills are similar to rules, but instead of reflecting coding preferences, they are better suited for encoding "steps of operation." If you have a specific way you want something to be done, you should embed it into a skill.

In fact, people often feel uneasy because they don’t know how an agent will resolve an issue. If you want to make this predictable, have the agent first research how it would solve the problem and then document the solution as a skill file. This way, you can see in advance how the agent will handle the issue and make corrections or improvements before it actually encounters the problem.

How do you make the agent aware of this skill? Exactly! You write it in CLAUDE.md that when encountering this scenario and needing to handle it, the agent should read this `SKILL.md`.

Processing rules and skills

You likely want to keep adding rules and skills to your agent—that’s how you give it personality and memory of your preferences. Almost everything else is unnecessary.

Once you start doing this, your agent will feel like magic—it will act exactly the way you want. Then, you’ll finally feel like you’ve truly grasped agent engineering.

Then...

You will see performance begin to decline again.

What's going on?!

It’s simple. As you add more and more rules and skills, they begin to conflict, or the agent starts suffering from severe context bloat. If the agent needs to read 14 Markdown files before starting to code, it faces the same problem of being overwhelmed with irrelevant information.

What should I do?

Clean up. Have your agent take a break—integrate rules and skills, and resolve conflicts by specifying your updated preferences.

Then it will feel like magic again.

That's it. This is truly the key. Keep it simple, use rules and skills, treat CLAUDE.md as a directory, and carefully pay attention to their context and design limitations.

Take responsibility for the results

There is no perfect proxy today. You can delegate much of the design and implementation work to a proxy, but you remain responsible for the outcome.

So be careful... and enjoy!

It’s a pleasure to play with future toys—while obviously using them for serious purposes!