CVPR Poster Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Poster

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Jiapeng Zhu · Ceyuan Yang · Kecheng Zheng · Yinghao Xu · Zifan Shi · Yifei Zhang · Qifeng Chen · Yujun Shen

ExHall D Poster #250

[ Abstract ]

Sat 14 Jun 3 p.m. PDT — 5 p.m. PDT

Abstract: Due to the difficulty in scaling up, generative adversarial networks (GANs) seem to be falling out of grace with the task of text-conditioned image synthesis. Sparsely activated mixture-of-experts (MoE) has recently been demonstrated as a valid solution to training large-scale models with limited resources. Inspired by this, we present Aurora, a GAN-based text-to-image generator that employs a collection of experts to learn feature processing, together with a sparse router to adaptively select the most suitable expert for each feature point. We adopt a two-stage training strategy, which first learns a base model at

$64\times64$ resolution followed by an upsampler to produce

$512\times512$ images. Trained with only public data, our approach encouragingly closes the performance gap between GANs and industry-level diffusion models, maintaining a fast inference speed. We will release the code and checkpoints to facilitate the community for more comprehensive studies of GANs.

Live content is unavailable. Log in and register to view live content