Alibaba open-sources Qwen-Image-Bench; GPT Image 2 leads in five categories

ME AI News: According to monitoring by Beating, Alibaba’s Qwen team has open-sourced a new image generation evaluation benchmark, Qwen-Image-Bench, specifically designed to assess large models’ text-to-image (T2I) capabilities. Alongside it, they have released Q-Judger, a unified visual judge model deeply trained on Qwen3.6-27B. The benchmark simulates professional artistic workflows and evaluates five key dimensions: image quality, aesthetics, text-image alignment, and two newly added criteria—real-world fidelity and creative generation—comprising 23 sub-capabilities and 56 detailed metrics. Qwen-Image-Bench includes 1,000 bilingual prompts, evenly split between short and long descriptions, each evaluating an average of over four dimensions simultaneously. For precise assessment, the Q-Judger visual judge model underwent blind and triple-review annotation under the supervision of 80 professional reviewers from art schools, with its training dataset encompassing over 130,000 bilingual expert-labeled pairs. The model outputs structured scores across all 56 dimensions, achieving a 92% alignment rate with human expert ratings. Initial evaluations of 18 leading image generation models show that GPT Image 2 leads with a composite score of 64.69, ranking first across all five dimensions. Nano Banana 2.0 scores 59.82, GPT Image 1.5 scores 59.65, and Nano Banana Pro scores 59.45, placing second, third, and fourth respectively. Alibaba’s proprietary Qwen Image 2.0 Pro ranks fifth with a score of 57.84, while GLM Image lags behind at 48.19. The data indicates that real-world fidelity and creative generation are critical differentiators among model tiers. The evaluation also reveals common technical bottlenecks across the industry: AI image models frequently struggle with depicting human hand anatomy, accurately representing physical laws such as gravity and lighting, and handling object interpenetration—top models all score below 44 on these dimensions. (Source: BlockBeats)