DPGF-Net: Dual-Prior Guided Fusion Network for Joint Assessment of Perceptual Quality and Semantic Consistency in AI-Generated Images
Abstract
The development of AI-generated technology requires effective image quality assessment (AGIQA) methods to jointly evaluate visual quality and text-content alignment, ensuring that the generated content is both visually appealing and faithful to the user's instructions. Nevertheless, visual degradation and text-content misalignment often coincide, and it is difficult to tell whether a bad subjective evaluation arises from prompt noncompliance or rendering artifacts. As such, disentangling image content and rendering distortions is vital. We propose the dual-prior guided fusion network (DPGF-Net), which leverages image-side priors to disentangle distortions from content and combines them with text-side prompt templates to simulate their interactions, to address this issue. DPGF-Net employs a local text-conditioned aggregation branch to highlight semantically relevant and quality-sensitive regions in conjunction with a global modulation branch that captures holistic perceptual characteristics. Finally, adaptive fusion produces a single score. Experiments on three AGIQA datasets demonstrate that our method is highly correlated with human judgments, with lower prediction error and stable evaluation behavior. The code will be released upon acceptance.