DWF Report: AI Outperforms Humans in DeFi Yield Optimization, Lags in Complex Trading

Author: DWF Ventures

Compiled by Deep潮 TechFlow

Shenchao Summary: AI agents now account for nearly one-fifth of DeFi trading volume and have indeed outperformed humans in rule-based scenarios like yield optimization. However, when it comes to autonomous trading, even top-tier AI performs worse than one-fifth of top human traders. This study breaks down AI’s real-world performance across different DeFi scenarios—essential reading for anyone interested in automated trading.

Key Points

Automation and agent activities currently account for approximately 19% of all on-chain activity, but true end-to-end autonomy has not yet been achieved.

In narrow, well-defined use cases such as yield optimization, agents have demonstrated superior performance compared to humans and bots. However, for multifaceted actions like trading, humans outperform agents.

Among agents, model selection and risk management have the greatest impact on trading performance.

As agents are adopted at scale, several risks related to trust and execution arise, including sybil attacks, strategy congestion, and privacy trade-offs.

Agent activity continues to grow

Over the past year, agent activity has grown steadily, with both transaction volume and number of transactions increasing. We’ve seen Coinbase’s x402 protocol spearheading major developments, with players like Visa, Stripe, and Google joining in to launch their own standards. Most of the infrastructure currently being built is designed to serve two primary use cases: channels between agents and agent invocations triggered by humans.

Although stablecoin trading is widely supported, the current infrastructure still relies on traditional payment gateways as its underlying layer, meaning it remains dependent on centralized counterparties. Therefore, the "fully autonomous" endpoint—where agents can self-finance, self-execute, and continuously optimize based on changing conditions—has not yet been achieved.

Agents are not entirely unfamiliar to DeFi. For years, automation through bots has existed within on-chain protocols to capture MEV or generate excess returns unattainable without code. These systems perform well under clearly defined parameters that do not change frequently or require additional oversight. However, markets have become increasingly complex over time—this is where the new generation of agents is entering, and the on-chain space has become a testing ground for such activities over the past few months.

The agent's actual performance

According to the report, agent activity has grown exponentially, with over 17,000 agents launched since 2025. The total volume of automated/agent activity is estimated to account for more than 19% of all on-chain activity. This is not surprising, as bots are estimated to generate over 76% of stablecoin transfer volume. This indicates significant growth potential for agent activity in DeFi.

Agents exhibit a wide spectrum of autonomy, ranging from chatbot-like experiences requiring heavy human oversight to agents capable of formulating adaptive strategies based on goal inputs and evolving market conditions. Compared to bots, agents offer several key advantages, including the ability to respond to and act on new information within milliseconds, and to scale coverage to thousands of markets while maintaining the same level of rigor.

Most agents are currently at the analyst to co-pilot level, as the majority are still in testing phases.

Yield Optimization: Agent Performs Excellently

Liquidity provision is an area where automation has occurred frequently, with the agent holding a total TVL exceeding $39 million. This figure primarily measures assets directly deposited by users into the agent, but excludes capital routed through the treasury.

Giza Tech is one of the largest protocols in this space, having launched its first agent application, ARMA, at the end of last year to enhance yield capture across major DeFi protocols. It has attracted over $190 million in assets under management and generated more than $4 billion in agent trading volume. The high ratio of trading volume to assets under management indicates that agents frequently rebalance capital, enabling higher yield capture. Once capital is deposited into the contract, execution is fully automated, providing users with a simple one-click experience requiring minimal oversight.

ARMA has demonstrated measurable outperformance, generating an annualized yield of over 9.75% for USDC. Even after accounting for additional rebalancing fees and the agent’s 10% performance fee, the yield still exceeds that of standard lending on Aave or Morpho. However, scalability remains a key concern, as these agents have not yet been battle-tested to manage or scale to the size of major DeFi protocols.

Trading: Humans are significantly ahead

However, for more complex actions such as trading, the outcomes are far more diverse. Current trading models operate on human-defined inputs and produce outputs based on predefined rules. Machine learning extends this by enabling models to update their behavior based on new information without explicit reprogramming, advancing them to a co-pilot role. With fully autonomous agents entering the scene, the trading landscape will undergo a dramatic transformation.

Several trading competitions have been held between agents and between humans and agents, revealing significant differences among models. Trade XYZ hosted a human-vs-agent trading competition for stocks listed on its platform. Each account started with $10,000 in initial capital, with no restrictions on leverage or trading frequency. The results overwhelmingly favored humans, with the top human performers outperforming the top agents by more than five times.

Meanwhile, Nof1 hosted an agent trading competition between models, pitting several models (Grok-4, GPT-5, Deepseek, Kimi, Qwen3, Claude, Gemini) against each other to test different risk configurations ranging from capital preservation to maximum leverage. The results revealed several factors that help explain performance differences:

Position duration: There is a strong correlation—models that hold each position for an average of 2–3 hours significantly outperform frequently traded models.

Expected value: This measures whether the model, on average, makes a profit per trade. Interestingly, only the top 3 models have a positive expected value, meaning most models have more losing trades than winning ones.

Leverage: Lower leverage levels averaging 6–8x have proven to perform better than models running above 10x leverage, as higher levels accelerate losses.

Prompt strategy: Monk Mode has been the highest-performing model to date, while Situational Awareness performed the worst. Based on the model’s characteristics, it shows that focusing on risk management and minimizing external inputs leads to better performance.

Base model: Grok 4.20 outperforms all other models by more than 22% across various prompting strategies and is the only model with average profitability.

Other factors, such as long/short bias, trade size, and confidence scores, lack sufficient data or have not been shown to correlate positively with model performance. Overall, the results indicate that the agent tends to perform better within clearly defined constraints, underscoring the continued need for human input in goal configuration.

How to evaluate an Agent

Since the agent is still in its early stages, a comprehensive evaluation framework is not yet available. Historical performance is often used as a benchmark for assessing the agent, but it is influenced by underlying factors that provide stronger indicators of a robust agent's performance.

Performance under varying volatility: Disciplined loss control during deteriorating conditions demonstrates the agent’s ability to identify off-chain factors affecting trading profitability.

Transparency vs. Privacy: Both sides involve trade-offs. A transparent agent that can be actively copied for trading will essentially have no strategic advantage. A private agent, on the other hand, risks internal extraction by its creator, who can easily front-run their own users.

Source of information: The data sources integrated by the agent are critical in determining how the agent makes decisions. It is essential to ensure that these sources are trustworthy and not subject to single points of failure.

Security: It is crucial to have smart contract audits and an appropriate fund custody architecture to ensure backup measures are in place for black swan events.

Next step for the agent

Significant work remains in infrastructure to enable large-scale adoption of agents. This comes down to critical issues around trust and execution of agents. Without safeguards on autonomous agent actions, instances of poor fund management have already occurred.

ERC-8004 launched in January 2026 as the first on-chain registry, enabling autonomous agents to discover each other, establish verifiable reputations, and collaborate securely. This represents a critical unlock for DeFi composability, as trust scores are embedded directly within smart contracts, permitting permissionless interactions between agents and protocols. However, this does not guarantee that agents will always act in good faith, as security vulnerabilities such as collusive reputations and Sybil attacks can still occur. Significant opportunities remain to be addressed in areas such as insurance, security, and economic staking of agents.

As agent activity expands in DeFi, strategy overcrowding becomes a structural risk. Yield farming provides the clearest precedent, where returns compress as strategies become popular. The same dynamic may apply to agent trading: if numerous agents are trained on similar data and optimize for similar objectives, they will converge on similar positions and similar exit signals.

The CoinAlg paper, published by Cornell University in January 2026, formalized a version of this problem. Transparent agents are susceptible to arbitrage because their trades are predictable and can be front-run. Private agents avoid this risk but introduce a different one: creators retain informational advantages over their users and can extract value from the very internal knowledge that opacity was meant to protect.

Agent activity will only continue to accelerate, and the infrastructure established today will determine how the next phase of on-chain finance operates. As agent adoption increases, they will self-iterate and become more attuned to user preferences. Consequently, the primary differentiator will be trustworthy infrastructure, and those with it will capture the largest market share.