VoxFace: Streaming Audio-Visual Synthesis via Relay-Style Multi-Token Prediction for Interactive Conversation
Junwen Xiong, Chuanyue Li, Peng Zhang
Successful Page Load