Do you still think text-to-image is stuck at Nano Banana?
But child, times have changed again.

@johnAGI168 https://x.com/johnAGI168/status/2044781168151724067

@0115hippo https://x.com/0115hippo/status/2044722124611539160
In early April, three anonymous image models appeared on the LM Arena evaluation platform, identified by the aliases maskingtape-alpha, packingtape-alpha, and gaffertape-alpha. They disappeared a few hours later.
OpenAI has not yet officially announced this model, but based on API metadata and user-side test records, it has already acquired a widely accepted name: GPT Image 2.

Screenshots can no longer be used as evidence.
One of the most obvious weaknesses of AI image generation models over the past few years has been text within images. In the DALL-E 3 era, asking it to include “Hello” in an image might result in “Hellp” or even “Hl10,” with letters looking as if they were drunk and staggered. GPT Image 1 improved significantly, capable of handling simple English labels. By GPT Image 1.5, its accuracy in rendering English text has reached nearly 95%, but it still shows clear shortcomings with non-Latin scripts such as Chinese, Japanese, and Korean.
However, the leaked sample images of GPT Image 2 changed this impression.


@MrLarus https://x.com/MrLarus/status/2044824800909054181


@akokoi1 https://x.com/akokoi1/status/2044789531615056175
The text in the image should be exactly as it is. Chinese characters must be clear, with accurate shapes and complete strokes. Someone tested generating an image in the style of an ID card, rendering the name, address, and ID number correctly, with neat layout—appearing at first glance like a photo of an authentic document.

This is good news. Advances in text rendering mean that generating infographics, posters, product packaging, and complexly laid-out charts are now more reliable.
But every coin has two sides. A model capable of generating highly realistic document templates and accurately rendering UI screenshots makes the idea that “screenshots can serve as evidence” increasingly questionable.
In comparison, this is the core distinction between the GPT Image series and other models. Midjourney has yet to make any progress in text rendering, and the Stable Diffusion series continues to face the same longstanding issues. According to leaked Arena test results, GPT Image 2 outperforms Midjourney in four key areas: text rendering, instruction following, photorealism, and world knowledge. Midjourney’s advantages remain primarily in artistic style and aesthetic control.

Does it really know what this world looks like?
A tester asked the model to generate a hypothetical GPT-8 product pricing page; the resulting image accurately replicated OpenAI’s official website style, with button placements and font choices resembling those taken from a real interface, and the pricing table’s hierarchical logic was correctly structured.

GPT Image 2 can generate images that closely resemble real software interfaces, including browser windows, mobile app interfaces, and data visualization charts, with a level of fidelity unmatched by the previous generation.

@johnAGI168 https://x.com/johnAGI168/status/2044781168151724067

@levelsio https://x.com/levelsio/status/2040333489476681758
This will enable several practical applications. Designers can describe the desired interface in text, and it will instantly generate a reference image for team discussions—no need to first open Figma and draw numerous wireframes. When creating investor decks, you can display a “product screenshot” without waiting for engineers to write code. When writing documentation, sample interface visuals can be generated directly, eliminating the need to stare at a blank page wondering where to find screenshots.



@marmaduke091 https://x.com/marmaduke091/status/2040338311873515597
Generating images is no longer just about "generating images."
OpenAI has announced that DALL-E 2 and DALL-E 3 will be officially discontinued on May 12, 2026. DALL-E 3 on Azure OpenAI was already retired in February.
DALL-E was where many people first encountered AI-generated images, and it’s only been a few years since those early, blurry creations.
Meanwhile, Google, which only recently established its industry position at the beginning of 2026 with the Nano Banana Pro, may feel pressure. Early test results show that GPT Image 2 outperforms the Nano Banana Pro simultaneously in realism, text rendering, and world knowledge—a three-way victory that is uncommon.
For creators, the feelings are complex. Illustrators, graphic designers, and photographers have not encountered this topic for the first time. Since the release of GPT Image 1, the number of freelance graphic design positions has dropped by approximately 18%. AI has indeed replaced the decision of “I need to hire someone to do this” in certain scenarios, but it is also creating new ways of working that enable one person to accomplish more.
The evolution of image generation models has accelerated to the point where there’s little time left to adapt. GPT Image 1 progressed to version 1.5 in just a few months, and from 1.5 to 2, roughly half a year. Each generation addresses the core limitations of the previous one while unlocking new possibilities.
GPT Image 2 is currently in an A/B testing phase, with some ChatGPT users having been randomly granted access. The official release is widely predicted to occur around May, coinciding with the retirement of DALL-E. To try it early, you can try your luck on the LM Arena evaluation platform.

Test Address: https://arena.ai
Based on community feedback and the known strengths of this model, the following prompt template can maximize your chances of success:
UI/Screenshot Prompt: A photorealistic screenshot of a mobile banking app, clearly displaying transaction records with legible dates, amounts, and merchant names. iPhone 16 screen, naturally held in hand, with a coffee shop background.
Product label description: A photorealistic image of a craft beer bottle, with clear details on the label showing the brewery name "Oakridge Brewing Co.", alcohol content of 6.8%, a mountain emblem, and the ingredient list. Studio lighting with a white background.
Identifier prompt: A street scene of a Tokyo alley at night, featuring multiple bilingual Japanese-English neon signs, including a ramen shop sign reading “Ichiban Ramen — Est. 1987,” a karaoke bar sign, and various glowing advertisements. The wet sidewalk after rain reflects the lights.
Interface/World knowledge prompt: A photorealistic screenshot of a YouTube video titled “How to Build a PC in 2026,” with 2.3 million views, featuring realistic comments, sidebar recommended videos, and channel information. Desktop browser view.
Widescreen trigger: This is a cinematic widescreen photo capturing the exterior of an IKEA store at dusk, featuring the glowing IKEA sign, realistic cars in the parking lot, and shoppers coming and going. Golden hour lighting, aspect ratio 16:9.
Image source and reference not cited: https://miraflow.ai/blog/how-to-use-duct-tape-ai-model-arena-gpt-image-2-guide
This article is from the WeChat public account "APPSO," authored by: Discovering Tomorrow's Products
