Poster Sat, Jun 6, 2026 • 3:45 PM – 5:45 PM PDT ExHall A & F 58

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Hanshen Zhu ⋅ Yuliang Liu ⋅ Xuecheng Wu ⋅ An-Lan Wang ⋅ Chao Feng ⋅ Dingkang Yang ⋅ ChaoFeng ChaoFeng ⋅ Can Huang ⋅ Jingqun Tang ⋅ Xiang Bai

Abstract

Visual Text Rendering (VTR) remains a critical challenge in text‑to‑image generation, where even advanced models frequently produce text with structural anomalies such as distortion, blurriness, and misalignment.% We identify a key bottleneck across both VTR evaluation and Reinforcement Learning (RL) processes: current evaluators and reward models lack the ability for fine-grained structural perception. However, we find that leading MLLMs and specialist OCR models largely fail to perceive these structural anomalies, creating a critical bottleneck for both VTR evaluation and RL‑based optimization. As a result, even state‑of‑the‑art generators (e.g., SeedDream4.0, Qwen‑Image) still struggle to render structurally faithful text.To address this, we propose TextPecker,a plug-and-play structural anomaly perceptive RL strategy that mitigates noisy reward signals and works with any text-to-image generator. To enable this capability, we construct a recognition dataset with character‑level structural‑anomaly annotations and develop a stroke‑editing synthesis engine to expand structural‑error coverage. Experiments show that TextPecker consistently improves diverse text‑to‑image models; even on the well‑optimized Qwen‑Image, it significantly yields average gains of 4\% in structural fidelity and 8.7\% in semantic alignment for Chinese text rendering, establishing a new state-of-the-art in high-fidelity VTR.Our work fills a gap in VTR optimization, providing a foundational step towards reliable and structural faithful visual text generation.