Codex uses computers through three interfaces: Computer Use, Chrome Extension, and In-App Browser.

icon MarsBit
Share
AI summary iconSummary

Editor’s Note: This article outlines three entry points for Codex to interact with external environments: Computer Use, Chrome Extension, and in-app Browser. Although all three appear to address the same issue—enabling Codex to use a computer—they correspond to distinct task scenarios, permission boundaries, and levels of trust.

Among these, Computer Use has the broadest coverage, enabling direct interaction with authorized native applications, system settings, and iOS simulators on macOS/Windows, and even automating workflows across multiple applications. It is ideal for GUI-based processes that lack API, plugin, or structured tool support, though it comes at the cost of slower speed and the widest permission scope. Chrome extensions are best suited for tasks relying on login sessions, cookies, multiple tabs, and browser identity—such as Gmail, LinkedIn, Salesforce, internal dashboards, or logged-in research across multiple websites. The in-app browser is primarily designed for development and debugging scenarios, especially for local services, visual bugs, responsive layouts, and design annotations; it does not inherit the user’s normal browser login state, has more limited capabilities, but offers stronger isolation.

The core insight is that Codex does not have just one way to "use a computer"; what truly matters is selecting the narrowest, safest, and most structured interface for each task. Use plugins or MCP whenever possible, rather than resorting to visual controls; prioritize the in-app browser for tasks involving web development; switch to Chrome only when user browser authentication and login state are required; and only use Computer Use as the final step when structured tools cannot suffice and the task absolutely depends on a desktop graphical interface.

Appshots is not a fourth way to control the computer, but a tool that presents the current screen context to Codex. It addresses the problem of context input, while Browser, Chrome, and Computer Use address action execution. Together, this layered approach reveals a key insight into AI agent productization: it’s not about granting models unlimited permissions, but about continuously narrowing permissions and defining clear boundaries within specific tasks, while preserving the user’s authority to review critical actions.

The following is the original text:

Codex offers three ways to use a computer: Computer Use, Chrome extension, and in-app browser.

There is some overlap between them, just enough to cause confusion.

After reading this article, you will know how to install and trigger these three methods, when to use each one, how Appshots and Developer Mode connect them, and what to write in AGENTS.md so that Codex can automatically select the appropriate interface.

The simple version is:

Computer Use

That said, whenever possible, prioritize using plugins or MCPs. For example, a Slack plugin can retrieve a thread more precisely than clicking around within Slack; actions generated by a GitHub plugin are also easier to verify than having Codex drive a web page. Visual control is best suited for situations where structured tool capabilities reach their limits.

Everything can be @Computer

Computer Use has the broadest coverage among these three interfaces. It enables Codex to view and interact with graphical interfaces on macOS and Windows, including windows, menus, keyboard input, and the clipboard within your authorized applications.

It is also typically the slowest. Structured plugins can directly call APIs; Computer Use, however, must observe the interface, determine where to click, wait for the application to respond, and then check the next state. This visual cycle takes time, but it also means Codex can interact with applications that have no available API at all.

On macOS, slowness doesn’t necessarily mean it’s interrupting you. Computer Use can operate authorized apps in the background while you continue using the rest of your computer. Often, I’ll open an app while using Codex, only to discover that Codex has already quietly completed a workflow in the background.

Based on the apps installed and authorized on your computer, these objects can include Spotify, Xcode, System Settings, the iOS Simulator, and even controlling your iPhone via iPhone Mirroring. It can also switch between multiple apps and handle workflows that span across different applications.

When the task depends on the following, you can use it:

Native desktop applications, such as Spotify or financial apps;

iOS Simulator, iPhone Mirroring, or other processes that can only be operated through a graphical interface;

System or application settings;

Data sources without plugins or APIs;

Workflow requiring switching between multiple applications;

The final missing step in a structured integration.

Installation method: Open Codex Settings > Computer Use, then click Install.

Trigger method: Mention @Computer or explicitly request Codex to use Computer Use. As the model's capabilities improve, it will also initiate this function autonomously when needed.

You can try a few examples first:

One of my favorite examples started when a package was stolen. Amazon told me I’d have to wait about 25 minutes to reach customer service, so I handed a Codex thread over to Computer Use, instructing it to check the chat window every five minutes, then switch to checking every minute once a representative appeared, and do its best to secure a refund for me. By the time I got back from my shower, the refund had already been processed.

I also use Computer Use as the "last mile" in a structured workflow. During one video release, Codex could read feedback from Slack, modify the code, and render a new video, but the Slack integration in that thread was unable to upload files. So, Computer Use clicked "Add file" to complete the missing step.

It also has the widest trust boundary among the three. Only assign it one clear application or process at a time. Keep it turned off when sensitive applications are not part of the task; carefully review permission pop-ups; and when financial, account, payment, credential, privacy, or system security changes are involved, it’s best to have a person present to supervise.

Use @Chrome to manage multiple tabs and login sessions

The Codex Chrome extension enables Codex to access your logged-in Chrome session. Use it when tasks depend on your account, cookies, browser profile, or open and authenticated tabs.

This type of interface is suitable for working with the following tools:

Gmail or LinkedIn;

Salesforce or customer service backend;

Internal dashboard;

Logged-in research across multiple websites;

Forms that rely on your account or browser extension.

Installation method: Open Codex's Plugins, add Chrome, and follow the setup instructions. Codex will guide you through installing the Codex Chrome extension and approving Chrome permissions. Once the extension shows "Connected," start a new thread.

Trigger method: Mention @Chrome, or explicitly request that Codex use your logged-in Chrome browser:

Chrome tasks run within tab groups, helping you keep all tabs related to a specific Codex thread together. Unlike the in-app browser, this interface carries your browser identity, making it more powerful and more sensitive.

Another key advantage is multi-tab control. Chrome allows multiple tabs to be associated with the same task, enabling you to read context on one tab, cross-reference information on another, and continue your workflow on a third. While Computer Use can also drive the browser visually, Chrome understands the task as a browser workflow rather than a sequence of screen coordinates.

Recently, I had a thread where I handed over an open Strudel Composer tab to Codex and asked it to make the music more interesting. Chrome provided it with the selected tab, along with the WebMCP tools exposed by the page. Codex examined the musical structure, rewrote the harmony and four-minute overall form, adjusted the tempo, saved the track, and let it continue playing. It didn’t need to visually search for every control on the interface because Chrome could combine the tab’s context with the page’s structured capabilities.

I also use it to run a long-term Twitter thread. The general instructions are:

The interesting part isn't that Codex can open Twitter, but that this thread can consistently return to the same logged-in workspace, link discovered content to local files, and leave behind a result I can review.

The trust boundary here is critical. The website may treat Codex clicks, form submissions, and message sends as actions taken by you. The webpage content itself is also untrusted input. Clearly distinguish steps with serious consequences: research, navigation, and drafting can be automated; however, you must review before sending, publishing, purchasing, or submitting.

If the entire task is completed within the browser, prioritize using Chrome over Computer Use. Chrome provides the native browser context required for these tasks without expanding access to the entire desktop.

Use the in-app @Browser to handle the website you're developing.

The in-app browser is a browser embedded within the Codex thread. You share the same rendered page with Codex, making it ideal for building and debugging web applications.

I usually start processing from here:

Local development server;

File preview page;

Public pages that do not require login;

Reproduce the visual bug;

Check the responsive layout;

Provide design feedback on page elements.

Its most important constraint is isolation. The in-app browser does not use your regular browser profile, cookies, extensions, login sessions, or existing tabs. This is a limitation when a task requires account authentication; however, when a task doesn’t require an account, it becomes a useful boundary.

Setup method: Open Codex's Plugins, add the Browser plugin, and enable it.

Trigger method: Mention @Browser in the prompt, or explicitly request that Codex use the in-app browser:

This creates a tight feedback loop: Codex can edit code, manipulate pages, check rendering states, take screenshots, and then revalidate the same process after fixing it.

My favorite feature is the annotation tool. When reviewing a local app, I can simply click on an element or select an area to leave comments. The style controls also allow me to precisely preview and provide feedback on text, fonts, spacing, and colors. I usually combine it with voice input and guided workflows: I review the page, leave comments, and continue adding more feedback while Codex processes the current input. The page itself becomes the specification document.

This is especially useful for design work. I often ask Codex to consolidate an idea, a research packet, or a project status into a single index.html file, then open it in the in-app browser. Instead of trying to describe an entire design set in another prompt, I can directly annotate the actual page: “This hierarchy is reversed,” “Don’t make it look so much like a card,” “These controls need more space,” or “Use this type scale site-wide.” Codex receives comments with relevant screenshots and element context, modifies the file, and reopens the same page for the next round.

This workflow feels more like collaborating with a designer on the same canvas, rather than passing back and forth screenshots and written instructions.

The in-app browser is also well-suited as a starting point for hybrid workflows. In another thread, I opened an X post in the in-app browser and asked Codex to investigate the related discussion. The visible page helped it confirm which post I meant; afterward, Codex switched to the Twitter CLI and retrieved 38 replies, including nested replies hidden from the browser view. This exemplifies the principle of “using the narrowest interface possible”: use the browser to confirm on-screen context, then employ structured tools for deeper retrieval.

There are trade-offs here. The isolated nature of the in-app browser makes it an excellent development interface, but it also means it’s not suitable for handling Google login, passkeys, or websites that rely on browser extensions. Switch to Chrome when identity matters.

App screenshots

Appshot is not a fourth way for Codex to control your computer. It’s a method for directing Codex to the context in front of you.

On a Mac, press the CMD key twice to capture the latest window. Codex will attach an image along with all available text to the thread. You can take an Appshot of an error, an email, a design, a settings panel, or an unfamiliar form, and then simply say:

This is the mental model I find easiest to remember: Appshots are how you point to something on your computer; Browser, Chrome, and Computer Use are how Codex takes action.

Appshots are currently created via the Codex app on macOS. It captures the frontmost window, not the entire desktop, making it a useful way to provide focused context without granting control over the app.

How to follow up on these developments

These interface changes happen quickly. If you want practical details instead of waiting for a massive release summary:

Follow Ari Weinstein (@AriX) for insights on Computer Use and Appshots;

Follow James Sun (@JamesZmSun) for content related to Browser;

Follow Andrew Ambrosino (@ajambrosino) for updates on the Codex app launch and the broader desktop product narrative;

Follow OpenAI Developers (@OpenAIDevs) for broader news on Codex and the OpenAI Platform.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.