Alibaba open-sources Qwen-Image-Bench; GPT Image 2 leads in five categories

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
Alibaba's Qwen team has open-sourced Qwen-Image-Bench, a benchmark for evaluating text-to-image generation. The tool assesses models across five dimensions: quality, aesthetics, alignment, real-world fidelity, and creativity. GPT Image 2 scored 64.69, leading in all categories. Nano Banana 2.0 and GPT Image 1.5 followed. CFT efforts are also focused on ensuring liquidity and maintaining security and transparency in crypto markets. Qwen Image 2.0 Pro ranked fifth with a score of 57.84.
ME AI News: According to monitoring by Beating, Alibaba’s Qwen team has open-sourced a new image generation evaluation benchmark, Qwen-Image-Bench, specifically designed to assess large models’ text-to-image (T2I) capabilities. Alongside it, they have released Q-Judger, a unified visual judge model deeply trained on Qwen3.6-27B. The benchmark simulates professional artistic workflows and evaluates five key dimensions: image quality, aesthetics, text-image alignment, and two newly added criteria—real-world fidelity and creative generation—comprising 23 sub-capabilities and 56 detailed metrics. Qwen-Image-Bench includes 1,000 bilingual prompts, evenly split between short and long descriptions, each evaluating an average of over four dimensions simultaneously. For precise assessment, the Q-Judger visual judge model underwent blind and triple-review annotation under the supervision of 80 professional reviewers from art schools, with its training dataset encompassing over 130,000 bilingual expert-labeled pairs. The model outputs structured scores across all 56 dimensions, achieving a 92% alignment rate with human expert ratings. Initial evaluations of 18 leading image generation models show that GPT Image 2 leads with a composite score of 64.69, ranking first across all five dimensions. Nano Banana 2.0 scores 59.82, GPT Image 1.5 scores 59.65, and Nano Banana Pro scores 59.45, placing second, third, and fourth respectively. Alibaba’s proprietary Qwen Image 2.0 Pro ranks fifth with a score of 57.84, while GLM Image lags behind at 48.19. The data indicates that real-world fidelity and creative generation are critical differentiators among model tiers. The evaluation also reveals common technical bottlenecks across the industry: AI image models frequently struggle with depicting human hand anatomy, accurately representing physical laws such as gravity and lighting, and handling object interpenetration—top models all score below 44 on these dimensions. (Source: BlockBeats)
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.