OPPO open-sources the Android AI agent framework X-OmniClaw

icon币界网
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
AI and crypto news broke as OPPO’s Multi-X team open-sourced the Android AI agent framework X-OmniClaw. The framework prioritizes on-device processing, using cloud models only for complex reasoning. It supports continuous AI assistant tasks by combining camera, screen, and voice inputs for real app operations. The system includes perception, execution, and memory layers to enable cross-task context and visual navigation. OPPO integrated behavior cloning and semantic memory to learn user actions and retain long-term data. The project is built on HermesApp and available on GitHub. On-chain observers may monitor how this framework could integrate with blockchain tools in the future.
CoinDesk reports:

Multi-X, a team under OPPO, has released X-OmniClaw, an open-source Android AI agent framework. The project emphasizes "edge-first" design, with core control, perception, and execution processes performed locally on the device, only leveraging cloud-based large models for complex reasoning tasks.

This framework is designed for scenarios where the phone serves as a continuous AI assistant, rather than a one-time Q&A chat tool. According to OPPO’s disclosed design, the system can understand the current environment by integrating camera input, screen content, and voice input, then directly perform actions within real apps.

Core capabilities are placed on local devices.

Many mobile AI systems currently rely on cloud-based operation, invoking Android virtual environments on servers to mimic user actions. While this approach simplifies unified deployment, it prevents direct access to the user’s real camera, photo gallery, and local files.

X-OmniClaw takes the opposite approach. The technical report states that this framework runs directly on users' physical devices, reducing the gap between virtual environments and real-world usage scenarios. OPPO summarizes its architecture into three components: perception, execution, and memory, which form a continuous loop.

  • The perception layer integrates cameras, screens, and voice input.
  • The execution layer is responsible for identifying interfaces and completing clicks and navigation.
  • The memory layer stores contextual information across tasks and sessions.

Recognizes screen and real-world scenes

In the perception phase, the system first uses a vision-language model to understand the current scene, then determines the next action. For example, if a user points the camera at a product and asks for its price, the agent will first identify the object, then open the corresponding shopping app to initiate a search—rather than merely guessing based on textual instructions.

The execution component combines XML interface data, on-device vision models, and OCR capabilities to determine exactly where to click on the page. Even when the interface contains numerous ads or incomplete structural information, the system can use visual recognition to assist in locating the target area.

OPPO has also added behavior cloning capabilities. If a user manually demonstrates a path to a deeper page once, the system can subsequently use Android deeplink to quickly replicate that path, reducing repetitive actions.

Introduce cross-dialogue semantic memory

Unlike conventional chatbots, X-OmniClaw emphasizes long-term semantic memory. The system not only retains context within a single task but also generates structured records about objects, scenes, and events based on album content, enabling future retrieval and execution.

OPPO demonstrated use cases including math problem assistance and album video generation. The former can read screen-based questions via a floating interface, process them step by step, and automatically proceed to the next question; the latter can filter relevant images from the album based on requests such as “parrot-themed photos” and then use a deeplink to open CapCut to batch-generate videos.

This means the role of the mobile AI agent is shifting from single-turn Q&A to continuous assistance. The report notes that X-OmniClaw was developed based on the open-source HermesApp codebase and incorporates skill structure design elements from OpenClaw. The project code has been released on GitHub, and OPPO plans to continue publishing related resources and updating the version.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.