D$^2$-FOSA: Dual-Diffusion Guided EEG-to-Image Reconstruction with Frequency-Oriented Semantic Alignment
Yu Chenglong ⋅ Shuai Shen ⋅ Xiangsheng Li ⋅ Yang Li
Abstract
Reconstructing visual semantics from Electroencephalography (EEG) signals enables a deeper understanding of human visual cognition and supports next-generation brain–computer interface (BCI) applications.Despite notable advances in recent years, most existing EEG encoders still struggle to capture the frequency-specific neural dynamics that reflect perceptual and cognitive rhythms. Moreover, the cross-modal alignment between EEG and visual content remains insufficiently tackled, leading to limited semantic consistency and visual fidelity. To address these issues, we propose D$^2$-FOSA, a unified dual-diffusion guided framework with frequency-oriented semantic alignment, which strengthens the frequency-aware EEG representation for more semantically aligned image reconstruction.Specifically, we design a Frequency-Spatio-Temporal Dynamics Encoder (FSTDE) based on the Frequency-Oriented Mamba (FOMamba) to explicitly model oscillatory patterns and long-range dependencies in EEG signals. The extracted features are then pulled into the CLIP-aligned visual semantic space via contrastive learning.Meanwhile, a Dual Diffusion Latent Generator (DDLG) with bidirectional EEG–image conditioning is designed to enforce cross-modal alignment and promote cycle-consistent generation.Extensive experiments on four challenging datasets demonstrate that our proposed D$^2$-FOSA significantly outperforms existing methods in both retrieval and reconstruction tasks. Particularly, our D$^2$-FOSA surpasses the contemporary MB2C method by over 20 FID in the reconstruction task on THINGS-EEG, indicating a substantial improvement in perceptual fidelity. The source code is in the supplementary material.
Successful Page Load