Skip to yearly menu bar Skip to main content


CVPR 2024 Accepted Papers

Papers are assigned to poster sessions such that topics are maximally spread over sessions (attendees will find interesting papers at each session) while grouping similar posters within each poster session to minimize walking distances. We used a 1D t-SNE projection of the SPECTER paper embeddings to realize this assignment.

This page is cached for 1 hour.  Changes to affiliation or name in your local profile may take up to 60 minutes to appear here.

MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Simon Giebenhain (Technische Universität München) · Tobias Kirschstein (Department of Informatics, Technische Universität München) · Markos Georgopoulos (Synthesia) · Martin Rünz (Synthesia) · Lourdes Agapito (University College London) · Matthias Nießner (Technical University of Munich)
GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
Shenhan Qian (Technische Universität München) · Tobias Kirschstein (Department of Informatics, Technische Universität München) · Liam Schoneveld (Woven by Toyota) · Davide Davoli (Toyota Motor Europe NV/SA associated partner by contracted services) · Simon Giebenhain (Technische Universität München) · Matthias Nießner (Technical University of Munich)
DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars
Tobias Kirschstein (Department of Informatics, Technische Universität München) · Simon Giebenhain (Technische Universität München) · Matthias Nießner (Technical University of Munich)
DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models
Khawar Islam (FloppyDisk.AI) · Muhammad Zaigham Zaheer (Mohamed bin Zayed University of Artificial Intelligence) · Arif Mahmood (Information Technology University, Lahore) · Karthik Nandakumar (Mohamed Bin Zayed University of Artificial Intelligence)
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang (Fudan University) · Dingkang Yang (Fudan University) · Zhaoyu Chen (Fudan University) · Yang Liu (Fudan University) · Siao Liu (Fudan University) · Wenqiang Zhang (None) · Lihua Zhang (Fudan University) · Lizhe Qi (Fudan University)
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
Boyang Peng (Tongji University) · Sanqing Qu (Tongji University) · Yong Wu (Tongji University) · Tianpei Zou (Tongji University) · Lianghua He (Tongji University) · Alois Knoll (Technical University Munich) · Guang Chen (Tongji University) · Changjun Jiang (Tongji University)
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
Takami Sato (None) · Justin Yue (University of California, Irvine) · Nanze Chen (University of Cambridge) · Ningfei Wang (University of California, Irvine) · Alfred Chen (University of California, Irvine)
Unsupervised Keypoints from Pretrained Diffusion Models
Eric Hedlin (University of British Columbia) · Gopal Sharma (None) · Shweta Mahajan (University of British Columbia) · Xingzhe He (None) · Hossam Isack (Google) · Abhishek Kar (Google) · Helge Rhodin (UBC) · Andrea Tagliasacchi (Simon Fraser University, Google Brain) · Kwang Moo Yi (University Of British Columbia)
Accelerating Neural Field Training via Soft Mining
Shakiba Kheradmand (University of British Columbia) · Daniel Rebain (None) · Gopal Sharma (None) · Hossam Isack (Google) · Abhishek Kar (Google) · Andrea Tagliasacchi (Simon Fraser University, Google Brain) · Kwang Moo Yi (University Of British Columbia)
Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
Alexander Raistrick (Princeton University) · Lingjie Mei (Princeton University) · Karhan Kayan (Princeton University) · David Yan (Princeton University) · Yiming Zuo (Princeton University) · Beining Han (Department of Computer Science, Princeton University) · Hongyu Wen (Princeton University) · Meenal Parakh (Princeton University) · Stamatis Alexandropoulos (Princeton University) · Lahav Lipson (Princeton University) · Zeyu Ma (Princeton university) · Jia Deng (Princeton University)
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models
Nastaran Saadati (Iowa State University) · Minh Pham (New York University) · Nasla Saleem (Iowa State University) · Joshua R. Waite (Iowa State University) · Aditya Balu (Iowa State University) · Zhanhong Jiang (Iowa State University) · Chinmay Hegde (New York University) · Soumik Sarkar (Iowa State University)
Fourier-basis functions to bridge augmentation gap: Rethinking frequency augmentation in image classification
Puru Vaish (University of Twente) · Shunxin Wang (University of Twente) · Nicola Strisciuglio (University of Twente)
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
Phuc Nguyen (VinAI Research) · Tuan Duc Ngo (UMass Amherst) · Evangelos Kalogerakis (UMass Amherst) · Chuang Gan (MIT-IBM Watson AI Lab) · Anh Tran (VinAI Research) · Cuong Pham (Posts & Telecommunications Institute of Technology and VinAI Research) · Khoi Nguyen (VinAI Research)
Perceptual-Oriented Video Frame Interpolation Via Asymmetric Synergistic Blending
Guangyang Wu (Shanghai Jiaotong University) · Xin Tao (Kuaishou) · Changlin Li (SeeKoo) · Wenyi Wang (University of Electronic Science and Technology of China) · Xiaohong Liu (Shanghai Jiao Tong University) · Qingqing Zheng ()
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
Qian Wang (Peking University) · Weiqi Li (Peking University) · Chong Mou (Peking University) · Xinhua Cheng (Peking University) · Jian Zhang (Peking University)
Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations
Rui Zhao (None) · Ruiqin Xiong (Peking University) · Jing Zhao (cncert) · Jian Zhang (Peking University) · Xiaopeng Fan (Harbin Institute of Technology) · Zhaofei Yu (Peking University) · Tiejun Huang (Peking University)
Super-Resolution Reconstruction from Bayer-Pattern Spike Streams
Yanchen Dong (Peking University) · Ruiqin Xiong (Peking University) · Jian Zhang (Peking University) · Zhaofei Yu (Peking University) · Xiaopeng Fan (Harbin Institute of Technology) · Shuyuan Zhu (University of Electronic Science and Technology of China) · Tiejun Huang (Peking University)
FreeKD: Knowledge Distillation via Semantic Frequency Prompt
Yuan Zhang (Peking University) · Tao Huang (The University of Sydney) · Jiaming Liu (Peking University) · Tao Jiang (Zhejiang University) · Kuan Cheng (Peking University) · Shanghang Zhang (Peking University)
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Shanshan Zhong (SUN YAT-SEN UNIVERSITY) · Zhongzhan Huang (Sun Yat-Sen University) · Shanghua Gao (Harvard University) · Wushao Wen (SUN YAT-SEN UNIVERSITY) · Liang Lin (Sun Yat-sen University) · Marinka Zitnik (Harvard University) · Pan Zhou (Sea Group)
LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes
Yanwen Guo (Nanjing University) · Yuanqi Li (Nanjing University) · Dayong Ren (nanjing university) · Xiaohong Zhang () · Jiawei Li (Nanjing University) · Liang Pu (None) · Changfeng Ma (Nanjing University) · xiaoyu zhan (Nanjing University) · Jie Guo (Nanjing University) · Mingqiang Wei (Nanjing University of Aeronautics and Astronautics) · Yan Zhang (None) · Piaopiao Yu (Nanjing University) · Shuangyu Yang (Nanjing University) · Donghao Ji (nanjing university) · Huisheng Ye (Nanjing University) · Hao Sun (nanjing university) · Yansong Liu (nanjing university) · Yinuo Chen (Nanjing University) · Jiaqi Zhu (nanjing university) · Hongyu Liu (nanjing university)
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Yiwen Chen (Nanyang Technological University) · Zilong Chen (Tsinghua University) · Chi Zhang (Tencent ) · Feng Wang (Tsinghua University, Tsinghua University) · Xiaofeng Yang (Nanyang Technological University) · Yikai Wang (Tsinghua University) · Zhongang Cai (Nanyang Technological University) · Lei Yang (The Chinese University of Hong Kong) · Huaping Liu (Tsinghua University, Tsinghua University) · Guosheng Lin (Nanyang Technological University)
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen (University of Adelaide) · Yuyuan Liu (University of Adelaide) · Hu Wang (The University of Adelaide) · Fengbei Liu (Cornell University) · Chong Wang (University of Adelaide) · Helen Frazer (BreastScreen Victoria) · Gustavo Carneiro (University of Surrey)
Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios
Jie Xu (University of Electronic Science and Technology of China) · Yazhou Ren (University of Electronic Science and Technology of China) · Xiaolong Wang (University of Electronic Science and Technology of China) · Lei Feng (Nanyang Technological University) · Zheng Zhang (Harbin Institute of Technology) · Gang Niu (RIKEN) · Xiaofeng Zhu (University of Electronic Science and Technology of China)
Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning
Tung Le (University of California, Irvine) · Khai Nguyen (UT Austin) · shanlin sun (University of California, Irvine) · Nhat Ho (University of Texas, Austin) · Xiaohui Xie (University of California, Irvine)
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis (Layer 6 AI) · Zhaoyan Liu (Layer6 AI) · Satya Krishna Gorti (Layer6 AI) · Valentin Villecroze (Layer 6) · Jesse C. Cresswell (Layer 6 AI) · Guangwei Yu (Layer6 AI) · Gabriel Loaiza-Ganem (Layer 6 AI) · Maksims Volkovs (Layer6 AI)
AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement
Shiwei Jin (None) · Zhen Wang (Qualcomm Technologies, Inc.) · Lei Wang (Qualcomm) · Peng Liu (Qualcomm Inc, QualComm) · Ning Bi (QualComm) · Truong Nguyen (University of California, San Diego)
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark
Jakub Paplham (Czech Technical University in Prague) · Vojtech Franc (Czech Technical University in Prague, Faculty of Electrical Engineering)
From a Bird’s Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
Zekun Qian (Tianjin University) · Ruize Han (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Wei Feng (Tianjin University) · Song Wang (University of South Carolina)
HOI-M$^3$: Capture Multiple Humans and Objects Interaction within Contextual Environment
Juze Zhang (ShanghaiTech University) · Jingyan Zhang (ShanghaiTech University) · Zining Song (ShanghaiTech University) · Zhanhe Shi (ShanghaiTech University) · Chengfeng Zhao (ShanghaiTech University) · Ye Shi (ShanghaiTech University) · Jingyi Yu (Shanghai Tech University) · Lan Xu (ShanghaiTech University) · Jingya Wang (ShanghaiTech University)
Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
Yakun Chang (None) · Yeliduosi Xiaokaiti (Peking University) · Yujia Liu (School of Computer Science, Peking University, Beijing, China) · Bin Fan (None) · Zhaojun Huang (Peking University) · Tiejun Huang (Peking University) · Boxin Shi (Peking University)
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang (Ohio State University, Columbus) · Wenbin He (Bosch) · Xiwei Xuan (University of California Davis) · Clint Sebastian (Bosch) · Jorge Piazentin Ono (Bosch) · Xin Li (Bosch Reserach) · Sima Behpour (Bosch Center for Artificial Intelligence (BCAI)) · Thang Doan (Bosch Center for Artificial Intelligence) · Liang Gou (Bosch) · Shen (Ohio State University) · Liu Ren (Bosch Research)
Solving Masked Jigsaw Puzzles with Diffusion Transformers
Jinyang Liu (Northeastern University) · Wondmgezahu Teshome (Northeastern University) · Sandesh Ghimire (QualComm) · Mario Sznaier (Northeastern University) · Octavia Camps (Northeastern University)
RoHM: Robust Human Motion Reconstruction via Diffusion
Siwei Zhang (ETH Zurich) · Bharat Lal Bhatnagar (Meta) · Yuanlu Xu (Meta Reality Labs Research) · Alexander Winkler (Meta) · Petr Kadlecek (Meta Reality Labs Research) · Siyu Tang (ETH Zurich) · Federica Bogo (Meta)
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Yu-Ying Yeh (University of California, San Diego) · Jia-Bin Huang (University of Maryland, College Park) · Changil Kim (Facebook) · Lei Xiao (None) · Thu Nguyen-Phuoc (Reality Labs Research, Meta) · Numair Khan (None) · Cheng Zhang (Facebook) · Manmohan Chandraker (UC San Diego) · Carl Marshall (Reality Labs Research) · Zhao Dong (Meta RL Research) · Zhengqin Li (Facebook)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
Chaoqin Huang (Shanghai Jiao Tong University) · Aofan Jiang (Shanghai Jiao Tong University) · Jinghao Feng (Shanghai Jiao Tong University) · Ya Zhang (Shanghai Jiao Tong University) · Xinchao Wang (National University of Singapore) · Yanfeng Wang (Shanghai Jiao Tong University)
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang (The Chinese University of Hong Kong) · Shengju Qian (The Chinese University of Hong Kong) · Bohao Peng (The Chinese University of Hong Kong) · Shu Liu (The Chinese University of Hong Kong) · Jiaya Jia (The Chinese University of Hong Kong)
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Sherry X Chen (University of California, Santa Barbara) · Yaron Vaxman (cloudinary) · Elad Ben Baruch (Cloudinary) · David Asulin (Cloudinary Ltd.) · Aviad Moreshet (Cloudinary) · Kuo-Chin Lien (Layer AI) · Misha Sra (University of California, Santa Barbara) · Pradeep Sen (UC Santa Barbara)
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
Han Liang (ShanghaiTech University) · Jiacheng Bao (Shanghai Tech University) · Ruichi Zhang (ShanghaiTech University) · Sihan Ren (ShanghaiTech University) · Yuecheng Xu (ShanghaiTech University) · Sibei Yang (None) · Xin Chen (University of Chinese Academy of Sciences, ShanghaiTech University) · Jingyi Yu (ShanghaiTech University) · Lan Xu (ShanghaiTech University)
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
Zan Wang (Beijing Institute of Technology) · Yixin Chen (BIGAI) · Baoxiong Jia (University of California, Los Angeles) · Puhao Li (Department of Automation, Tsinghua University) · Jinlu Zhang (Peking University) · Jingze Zhang (Tsinghua University, Tsinghua University) · Tengyu Liu (None) · Yixin Zhu (Peking University) · Wei Liang (Beijing Institute of Technology) · Siyuan Huang (Beijing Institute of General Artificial Intelligence)
ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation
Jia-Hao Wu (National Yang Ming Chiao Tung University) · Fu-Jen Tsai (National Tsing Hua University) · Yan-Tsung Peng (National Chengchi University) · Charles Tsai (Qualcomm Inc, QualComm) · Chia-Wen Lin (National Tsing Hua University) · Yen-Yu Lin (National Yang Ming Chiao Tung University)
Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households
Zhihao Cao (Tsinghua University, Tsinghua University) · ZiDong Wang (Department of Automation, Tsinghua University, Tsinghua University) · Siwen Xie (Peking University) · Anji Liu (University of California, Los Angeles) · Lifeng Fan (Beijing Institute of General Artificial Intelligence)
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan (University of Maryland, College Park) · Fuxiao Liu (University of Maryland) · Xiyang Wu (University of Maryland, College Park) · Ruiqi Xian (University of Maryland, College Park) · Zongxia Li (University of Maryland, College Park) · Xiaoyu Liu (University of Maryland, College Park) · Xijun Wang (University of Maryland, College Park) · Lichang Chen (Department of Computer Science, University of Maryland, College Park) · Furong Huang (Department of Computer Science, University of Maryland) · Yaser Yacoob (University of Maryland, College Park) · Dinesh Manocha (University of Maryland, College Park) · Tianyi Zhou (University of Maryland, College Park)
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
Chaokang Jiang () · Guangming Wang (University of Cambridge) · Jiuming Liu (Shanghai Jiao Tong University) · Hesheng Wang (Shanghai Jiao Tong University) · Zhuang Ma (PhiGent) · Zhenqiang Liu (None) · LIANG (None) · Yi Shan (PhiGent Robotics) · Dalong Du (PhiGent Robotics)
DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement
Jiuming Liu (Shanghai Jiao Tong University) · Guangming Wang (University of Cambridge) · Weicai Ye (Zhejiang University) · Chaokang Jiang () · Jinru Han (Shanghai Jiao Tong University) · Zhe Liu (Shanghai Jiaotong University) · Guofeng Zhang (Zhejiang University) · Dalong Du (PhiGent Robotics) · Hesheng Wang (Shanghai Jiao Tong University)
CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs
Haocheng Yuan (University of Edinburgh) · Jing Xu (University of Edinburgh, University of Edinburgh) · Hao Pan (Microsoft Research) · Adrien Bousseau (INRIA) · Niloy J. Mitra (University College London) · Changjian Li (University of Edinburgh)
Looking 3D: Anomaly Detection with 2D-3D Alignment
Ankan Kumar Bhunia (The University of Edinburgh) · Changjian Li (University of Edinburgh) · Hakan Bilen (University of Edinburgh)
ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention
Jiawei Wang (Shandong University) · Changjian Li (University of Edinburgh)
Active Domain Adaptation with False Negative Prediction for Object Detection
Yuzuru Nakamura (Panasonic Holdings Corporation) · Yasunori Ishii (Panasonic Holdings Corporation) · Takayoshi Yamashita (Chubu University)
No More Ambiguity in 360$^\circ$ Room Layout via Bi-Layout Estimation
Yu-Ju Tsai (University of California, Merced) · Jin-Cheng Jhang (National Tsing Hua University) · JINGJING ZHENG (None) · Wei Wang (Amazon) · Albert Chen (Amazon) · Min Sun (Amazon/NTHU) · Cheng-Hao Kuo (Amazon) · Ming-Hsuan Yang (University of California at Merced)
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Yibo Wang (Tsinghua University) · Ruiyuan Gao (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Kai Chen (The Hong Kong University of Science and Technology) · Kaiqiang Zhou (Huawei Technologies Ltd.) · Yingjie CAI (The Chinese University of Hong Kong) · Lanqing Hong (Huawei Technologies Ltd.) · Zhenguo Li (Huawei) · Lihui Jiang (Huawei Technologies Ltd.) · Dit-Yan Yeung (Hong Kong University of Science and Technology) · Qiang Xu (The Chinese University of Hong Kong) · Kai Zhang (Shenzhen International Graduate School, Tsinghua University)
PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought
Junyi Yao (None) · Yijiang Liu (Nanjing University) · Zhen Dong (PhD/Postdoc UC Berkeley) · Mingfei Guo (Stanford University) · Helan Hu (Peking University) · Kurt Keutzer (EECS, UC Berkeley) · Li Du (Nanjing University) · Daquan Zhou (National University of Singapore) · Shanghang Zhang (Peking University)
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
Mingyu Lee (Chung-Ang University, LGCNS) · Jongwon Choi (Chung-Ang University)
TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video
Minye Wu (KU Leuven) · Zehao Wang (KU Leuven) · Georgios Kouros (Department of Electrical Engineering, KU Leuven, Belgium, KU Leuven) · Tinne Tuytelaars (KU Leuven)
HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models
Li Pang (Xi'an Jiaotong University) · Xiangyu Rui (Xi'an Jiaotong University) · Long Cui (Xi'an Jiaotong University) · Hongzhong Wang (Xi'an Jiaotong University) · Deyu Meng () · Xiangyong Cao (Xi'an Jiaotong University)
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao (Shanghai Jiao Tong University) · Mengxi Chen (Shanghai Jiaotong University) · Tianjie Dai (Shanghai Jiao Tong University) · Jiangchao Yao (Shanghai Jiaotong University) · Bo Han (HKBU) · Ya Zhang (Shanghai Jiao Tong University) · Yanfeng Wang (Shanghai Jiao Tong University)
Learning from Synthetic Human Group Activities
Che-Jui Chang (Rutgers University) · Danrui Li (Rutgers University) · Deep Patel (NEC Laboratories America) · Parth Goel (Oracle) · Seonghyeon Moon (Roblox) · Samuel Sohn (Rutgers University) · Honglu Zhou (Rutgers University) · Sejong Yoon (The College of New Jersey) · Vladimir Pavlovic (Rutgers University) · Mubbasir Kapadia (Rutgers University )
Towards Automatic Power Battery Detection: New Challenge, Benchmark Dataset and Baseline
Xiaoqi Zhao (Dalian University of Technology) · Youwei Pang (Dalian University of Technology) · Zhenyu Chen (Dalian University of Technology) · Qian Yu (Dalian University of Technology) · Lihe Zhang (Dalian University of Technology) · Hanqi Liu (Ohio State University, Columbus) · Jiaming Zuo (University of Southern California) · Huchuan Lu (Dalian University of Technology)
MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
Kaiyu Song (SUN YAT-SEN UNIVERSITY) · Hanjiang Lai (SUN YAT-SEN UNIVERSITY) · Yan Pan (SUN YAT-SEN UNIVERSITY) · Jian Yin ()
Cinematic Behavior Transfer via NeRF-based Differentiable Filming
Xuekun Jiang (Shanghai Artificial Intelligence Laboratory) · Anyi Rao (Stanford University) · Jingbo Wang (Shanghai AI LAB) · Dahua Lin (The Chinese University of Hong Kong) · Bo Dai (Shanghai AI Laboratory)
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman (University of Texas at Austin) · Andrew Westbury (Facebook AI Research) · Lorenzo Torresani (Facebook) · Kris Kitani (Carnegie Mellon University) · Jitendra Malik (University of California at Berkeley) · Triantafyllos Afouras (University of Oxford) · Kumar Ashutosh (UT Austin & FAIR, Meta) · Vijay Baiyya (University of Louisiana at Lafayette) · Siddhant Bansal (University of Bristol, UK) · Bikram Boote (University of Illinois, Urbana Champaign) · Eugene Byrne (Meta) · Zachary Chavis (University of Minnesota) · Joya Chen (National University of Singapore) · Feng Cheng (University of North Carolina at Chapel Hill) · Fu-Jen Chu (Facebook) · Sean Crane (School of Computer Science, Carnegie Mellon University) · Avijit Dasgupta (IIIT Hyderabad) · Jing Dong (Meta) · Maria Escobar (Universidad de Los Andes) · Cristhian David Forigua Diaz (Reblink) · Abrham Gebreselasie (Carnegie Mellon University) · Sanjay Haresh (Qualcomm Inc, QualComm) · Jing Huang (Facebook) · Md Mohaiminul Islam (UNC Chapel Hill) · Suyog Jain (Meta) · Rawal Khirodkar (Meta) · Devansh Kukreja (Carnegie Mellon University) · Kevin Liang (FAIR at Meta) · Jia-Wei Liu (National University of Singapore) · Sagnik Majumder (UT Austin & Meta AI) · Yongsen Mao (Simon Fraser University) · Miguel Martin (Meta Platforms, Inc.) · Effrosyni Mavroudi () · Tushar Nagarajan (Meta) · Francesco Ragusa (None) · Santhosh Kumar Ramakrishnan (University of Texas, Austin) · Luigi Seminara (University of Catania) · Arjun Somayazulu (University of Texas at Austin) · Yale Song (Meta) · Shan Su (University of Pennsylvania) · Zihui Xue (None) · Edward Zhang (University of Pennsylvania, University of Pennsylvania) · Jinxu Zhang (University of Pennsylvania, University of Pennsylvania) · Angela Castillo (Universidad de Los Andes) · Changan Chen (University of Texas at Austin) · Fu Xinzhu (National University of Singapore) · Ryosuke Furuta (The University of Tokyo) · Cristina González (Universidad de Los Andes) · Gupta (None) · Jiabo Hu (Facebook) · Yifei Huang (The University of Tokyo) · Yiming Huang (University of Pennsylvania) · Weslie Khoo (Indiana University) · Anush Kumar (Torc Robotics) · Robert Kuo (Facebook) · Sach Lakhavani (None) · Miao Liu (META AI) · Mi Luo (The University of Texas at Austin) · Zhengyi Luo (Carnegie Mellon University) · Brighid Meredith (meta) · Austin Miller (Meta) · Oluwatumininu Oguntola (University of North Carolina at Chapel Hill) · Xiaqing Pan (Meta) · Penny Peng (Meta) · Shraman Pramanick (None) · Merey Ramazanova (KAUST) · Fiona Ryan (Georgia Institute of Technology) · Wei Shan (University of North Carolina at Chapel Hill) · Kiran Somasundaram (None) · Chenan Song (national university of singaore, National University of Singapore) · Audrey Southerland (Georgia Institute of Technology) · Masatoshi Tateno (AIST, National Institute of Advanced Industrial Science and Technology) · Huiyu Wang (Facebook) · Yuchen Wang (Indiana University) · Takuma Yagi (None) · Mingfei Yan (None) · Xitong Yang (Meta) · Zecheng Yu (University of Tokyo) · Shengxin Zha (Meta GenAI) · Chen Zhao (King Abdullah University of Science and Technology (KAUST)) · Ziwei Zhao (Indiana University) · Zhifan Zhu (University of Bristol) · Jeff Zhuo (University of North Carolina at Chapel Hill) · Pablo ARBELAEZ (Universidad de los Andes) · Gedas Bertasius (UNC Chapel Hill) · Dima Damen (University of Bristol and Google DeepMind) · Jakob Engel (Research, Meta Reality Labs) · Giovanni Maria Farinella (University of Catania, Italy) · Antonino Furnari (University of Catania) · Bernard Ghanem (KAUST) · Judy Hoffman (Georgia Institute of Technology) · C.V. Jawahar (IIIT-Hyderabad) · Richard Newcombe (Meta, Reality Labs Research) · Hyun Soo Park (The University of Minnesota) · James Rehg (None) · Yoichi Sato (University of Tokyo) · Manolis Savva (Simon Fraser University) · Jianbo Shi (None) · Mike Zheng Shou (National University of Singapore) · Michael Wray (University of Bristol)
Random Entangled Tokens for Adversarially Robust Vision Transformer
Huihui Gong (University of Sydney) · Minjing Dong (City University of Hong Kong) · Siqi Ma (University of New South Wales) · Seyit Camtepe (CSIRO) · Surya Nepal (, CSIRO) · Chang Xu (University of Sydney)
$360+x$: A Panoptic Multi-modal Scene Understanding Dataset
Hao Chen (University of Birmingham) · Yuqi Hou (University of Birmingham) · Chenyuan Qu (University of Birmingham) · Irene Testini (Cardiff University) · Xiaohan Hong (University of Birmingham) · Jianbo Jiao (University of Birmingham)
Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery
Yuqi Zhang (The Chinese University of Hong Kong, Shenzhen) · Guanying Chen (The Chinese University of Hong Kong, Shenzhen) · Jiaxing Chen (Sun Yat-Sen University) · Shuguang Cui (The Chinese University of Hong Kong, Shenzhen)
Any-Shift Prompting for Generalization over Distributions
Zehao Xiao (University of Amsterdam) · Jiayi Shen (University of Amsterdam) · Mohammad Mahdi Derakhshani (University of Amsterdam) · Shengcai Liao (Inception Institute of Artificial Intelligence) · Cees G. M. Snoek (University of Amsterdam)
BioCLIP: A Vision Foundation Model for the Tree of Life
Samuel Stevens (Ohio State University, Columbus) · Jiaman Wu (Ohio State University, Columbus) · Matthew Thompson (Ohio State University, Columbus) · Elizabeth Campolongo (The Ohio State University) · Chan Hee Song (The Ohio State University) · David Carlyn (Ohio State University) · Li Dong (Microsoft Research) · Wasila Dahdul (University of California, Irvine) · Charles Stewart (Rensselaer Polytechnic Institute) · Tanya Berger-Wolf (Ohio State University) · Wei-Lun Chao (Ohio State University) · Yu Su (Ohio State University)
DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Harsh Rangwani (Indian Institute of Science) · Pradipto Mondal (Indian Institute of Technology, Kharagpur) · Mayank Mishra (CMU, Carnegie Mellon University) · Ashish Asokan (Indian Institute of Science, Indian institute of science, Bangalore) · R. Venkatesh Babu (Indian Institute of Science)
Higher-order Relational Reasoning for Pedestrian Trajectory Prediction
Sungjune Kim (Korea University) · Hyung-gun Chi (Purdue University) · Hyerin Lim (Hyundai Motor Company) · Karthik Ramani (Purdue University) · Jinkyu Kim (Korea University) · Sangpil Kim (Korea University)
Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
Dongliang Cao (University of Bonn) · Marvin Eisenberger (Technical University Munich) · Nafie El Amrani (University of Bonn) · Daniel Cremers (Technical University Munich) · Florian Bernard (University of Bonn)
H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration
MORTEZA GHAHREMANI (Technische Universität München) · Mohammad Khateri (University of Eastern Finland) · Bailiang Jian (Technische Universität München) · Benedikt Wiestler (Technical University Munich) · Ehsan Adeli (Stanford University) · Christian Wachinger (Technische Universität München)
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen (School of Computer Science, University of Illinois at Urbana-Champaign) · Karan Sikka (SRI International) · Michael Cogswell (SRI International) · Heng Ji (University of Illinois, Urbana-Champaign) · Ajay Divakaran (SRI International)
AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation
Taeckyung Lee (KAIST) · Sorn Chottananurak (KAIST) · Taesik Gong (Bell Labs) · Sung-Ju Lee (Korea Advanced Institute of Science & Technology)
POPDG:Popular 3D Dance Generation with PopDanceSet
ZhenYe Luo (Beijing Normal University) · Min Ren (Beijing Normal University) · Xuecai Hu (Beijing Normal University) · Yongzhen Huang (Beijing Normal University) · Li Yao (Beijing Normal University)
Deep Equilibrium Diffusion Restoration with Parallel Sampling
Jiezhang Cao (ETH Zürich) · Yue Shi (Shanghai Jiao Tong University) · Kai Zhang (None) · Yulun Zhang (Shanghai Jiao Tong University) · Radu Timofte (University of Würzburg) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.)
4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations
Wenbo Wang (ETHZ - ETH Zurich) · Hsuan-I Ho (ETHZ - ETH Zurich) · Chen Guo (ETH Zurich) · Boxiang Rong (ETHZ - ETH Zurich) · Artur Grigorev () · Jie Song (ETHZ - ETH Zurich) · Juan Jose Zarate (Department of Computer Science, ETHZ - ETH Zurich) · Otmar Hilliges (None)
Rich Human Feedback for Text-to-Image Generation
Youwei Liang (University of California, San Diego) · Junfeng He (Google) · Gang Li (Google) · Peizhao Li (GE HealthCare) · Arseniy Klimovskiy (Google) · Nicholas Carolan (Google) · Jiao Sun (University of Southern California) · Jordi Pont-Tuset (Google Research) · Sarah Young (Google) · Feng Yang (Google Research) · Junjie Ke (None) · Krishnamurthy Dvijotham (Google DeepMind) · Katherine Collins (University of Cambridge) · Yiwen Luo (Research, Google) · Yang Li (Google) · Kai Kohlhoff (Google Research) · Deepak Ramachandran (Google) · Vidhya Navalpakkam (Research, Google)
Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline
Xiao Wang (Anhui University) · Shiao Wang (None) · Chuanming Tang (University of Chinese Academy of Sciences) · Lin Zhu (Beijing Institute of Technology) · Bo Jiang (Anhui University) · Yonghong Tian (Peking University) · Jin Tang (Anhui University)
Learning from One Continuous Video Stream
Joao Carreira (DeepMind) · Michael King (Fit) · Viorica Patraucean (DeepMind) · Dilara Gokay (Google DeepMind) · Catalin Ionescu (Google) · Yi Yang (DeepMind) · Daniel Zoran (DeepMind) · Joseph Heyward (Google) · Carl Doersch (DeepMind) · Yusuf Aytar (Google DeepMind) · Dima Damen (University of Bristol and Google DeepMind) · Andrew Zisserman (University of Oxford)
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk (None) · Jaesung Huh (University of Oxford) · Evangelos Kazakos (Czech Technical University of Prague) · Andrew Zisserman (University of Oxford) · Dima Damen (University of Bristol and Google DeepMind)
Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping
Peng Sun (Hunan University) · Xinyang Liu (Hong Kong Polytechnic University) · Zhibo Wang (Zhejiang University) · Bo Liu (Shenzhen Institute of Artificial Intelligence and Robotics for Society)
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
Tianyu Huang (Harbin Institute of Technology & City University of Hong Kong) · Yihan Zeng (Huawei Technologies Ltd.) · Zhilu Zhang (Harbin Institute of Technology) · Wan Xu (Harbin Institute of Technology) · Hang Xu (Huawei Noah‘s Ark Lab) · Songcen Xu (Huawei Noah's Ark Lab) · Rynson W.H. Lau (City University of Hong Kong) · Wangmeng Zuo (Harbin Institute of Technology)
Backpropagation-free Network for 3D Test-time Adaptation
YANSHUO WANG (CSIRO) · Ali Cheraghian (CSIRO) · Zeeshan Hayder (CSIRO) · JIE HONG (Australian National University) · Sameera Ramasinghe (Amazon) · Shafin Rahman (North South University) · David Ahmedt-Aristizabal (CSIRO) · Xuesong Li (Australian National University) · Lars Petersson (CSIRO) · Mehrtash Harandi (Monash University)
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
Jia-Wei Liu (National University of Singapore) · Yan-Pei Cao (Tencent ARC Lab) · Jay Zhangjie Wu (National University of Singapore) · Weijia Mao (NUS) · Yuchao Gu (None) · Rui Zhao (None) · Jussi Keppo (National University of Singapore) · Ying Shan (Tencent) · Mike Zheng Shou (National University of Singapore)
I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions
Chengfeng Zhao (ShanghaiTech University) · Juze Zhang (ShanghaiTech University) · Jiashen Du (None) · Ziwei Shan (ShanghaiTech University) · Junye Wang (ShanghaiTech University) · Jingyi Yu (Shanghai Tech University) · Jingya Wang (ShanghaiTech University) · Lan Xu (ShanghaiTech University)
FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment
Jinglin Xu (University of Science and Technology Beijing) · Sibo Yin (Peking University) · Guohao Zhao (Peking University) · Zishuo Wang (None) · Yuxin Peng (Peking University)
X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
Shuofeng Sun (Beijing University of Posts and Telecommunications) · Yongming Rao (Tsinghua University) · Jiwen Lu (Tsinghua University) · Haibin Yan (Beijing University of Posts and Telecommunications)
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
Zihua Liu (Tokyo Institute of Technology) · Hiroki Sakuma (T2 Inc.) · Masatoshi Okutomi (Tokyo Institute of Technology)
Readout Guidance: Learning Control from Diffusion Features
Grace Luo (University of California, Berkeley) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Oliver Wang (Adobe Research) · Dan B Goldman (None) · Aleksander Holynski (UC Berkeley & Google Research)
NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
Minh-Tuan Tran (Monash University) · Trung Le (Monash University) · Xuan-May Le (University of Melbourne) · Mehrtash Harandi (Monash University) · Quan Tran (servicenow) · Dinh Phung (Monash University)
Exact Fusion via Feature Distribution Matching for Few-shot Image Generation
Yingbo Zhou (East China Normal University) · Yutong Ye (None) · Pengyu Zhang (East China Normal University) · Xian Wei (Chinese Academy of Sciences) · Mingsong Chen (East China Normal University)
NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild
weining ren (ETHz) · Zihan Zhu (ETHZ - ETH Zurich) · Boyang Sun (ETH Zurich) · Jiaqi Chen (ETHZ - ETH Zurich) · Marc Pollefeys (ETH Zurich / Microsoft) · Songyou Peng (ETH Zurich & MPI Tübingen)
Systematic comparison of semi-supervised and self-supervised learning for medical image classification
Zhe Huang (Tufts University) · Ruijie Jiang (Tufts University) · Shuchin Aeron (Tufts University) · Michael C. Hughes (Tufts University)
Decoupled Pseudo-labeling in Semi-Supervised Monocular 3D Object Detection
Jiacheng Zhang (SUN YAT-SEN UNIVERSITY) · Jiaming Li (Baidu) · Xiangru Lin (Baidu) · Wei Zhang (Baidu) · Xiao Tan (Baidu) · Junyu Han (Baidu) · Errui Ding (Baidu Inc.) · Jingdong Wang (Baidu) · Guanbin Li (Sun Yat-sen University)
Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection
Jiaming Li (Baidu) · Jiacheng Zhang (SUN YAT-SEN UNIVERSITY) · Jichang Li (The University of Hong Kong) · Ge Li (Peking University Shenzhen Graduate School) · Si Liu (Beihang University) · Liang Lin (Sun Yat-sen University) · Guanbin Li (Sun Yat-sen University)
Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World
Wen Yin (Huazhong University of Science and Technology) · Jian Lou (Zhejiang University) · Pan Zhou (Huazhong University of Science and Technology) · Yulai Xie (Huazhong University of Science and Technology) · Dan Feng (Huazhong University of Science and Technology) · Yuhua Sun (None) · Tailai Zhang (Huazhong University of Science and Technology) · Lichao Sun (Lehigh University)
MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
Yixin Liu (Lehigh Universisty) · Chenrui Fan (Huazhong University of Science and Technology) · Yutong Dai (Lehigh University) · Xun Chen (Samsung Research America) · Pan Zhou (Huazhong University of Science and Technology) · Lichao Sun (Lehigh University)
ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models
Fei Kong (University of Electronic Science and Technology of China) · Jinhao Duan (Drexel University) · Lichao Sun (Lehigh University) · Hao Cheng (Hong Kong University of Science and Technology(Guangzhou)) · Renjing Xu (Hong Kong University of Science and Technology (Guangzhou)) · Heng Tao Shen (University of Electronic Science and Technology of China) · Xiaofeng Zhu (University of Electronic Science and Technology of China) · Xiaoshuang Shi (University of Electronic Science and Technology of China) · Kaidi Xu (Drexel University)
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi (Google) · Skanda Koppula (Google Deepmind) · Shreya Pathak (Google) · Justin Chiu (Google) · Joseph Heyward (Google) · Viorica Patraucean (DeepMind) · Jiajun Shen (DeepMind) · Antoine Miech (DeepMind) · Andrew Zisserman (University of Oxford) · Aida Nematzadeh (Google Deepmind)
LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset
Haolin Liu (The Chinese University of Hong Kong, Shenzhen) · Chongjie Ye (The Chinese University of Hong Kong, Shenzhen) · Yinyu Nie (Huawei Technologies Ltd.) · Yingfan He (Chinese University of Hong Kong, Shenzhen) · Xiaoguang Han (The Chinese University of Hong Kong, Shenzhen)
Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Marianna Ohanyan (Picsart) · Hayk Manukyan (Picsart AI Research) · Zhangyang Wang (University of Texas at Austin) · Shant Navasardyan (Picsart AI Research) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
Region-Based Representations Revisited
Michal Shlapentokh-Rothman (University of Illinois at Urbana Champaign) · Ansel Blume (University of Illinois Urbana Champaign) · Yao Xiao (University of Illinois at Urbana-Champaign) · Yuqun Wu (Department of Computer Science) · Sethuraman T V (Department of Computer Science) · Heyi Tao (University of Illinois at Urbana-Champaign) · Jae Yong Lee (University of Illinois at Urbana-Champaign) · Wilfredo Torres-Calderon (Reconstruct) · Yu-Xiong Wang (None) · Derek Hoiem (University of Illinois at Urbana-Champaign)
NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis
Zinuo You (ETH Zurich) · Andreas Geiger (University of Tübingen) · Anpei Chen (Department of Computer Science, ETHZ - ETH Zurich)
Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes
Zhiyuan Yu (Na) · Zheng Qin (National University of Defense Technology) · lintao zheng (National University of Defense Technology) · Kai Xu (National University of Defense Technology)
Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
ChangHee Yang (LG Electornic) · ChanHee Kang (Sogang University) · Kyeongbo Kong (Pusan National University) · Hanni Oh (Sogang University) · Suk-Ju Kang (Sogang University)
Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention
Ju-Hyeon Nam (Inha University) · Nur Suriza Syazwany (Inha University) · Su Jung Kim (Inha University) · Sang-Chul Lee (Inha University)
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer
Yuwen Tan (Huazhong University of Science and Technology) · Qinhao Zhou (Huazhong University of Science and Technology) · Xiang Xiang (Huazhong University of Science and Technology) · Ke Wang (Alibaba Group) · Yuchuan Wu (Alibaba Group) · Yongbin Li (Alibaba Group)
Learning Multi-dimensional Human Preference for Text-to-Image Generation
Sixian Zhang (None) · Bohan Wang (Kuaishou) · Junqiang Wu (Kuaishou) · Yan Li (kuaishou) · Tingting Gao (China Agricultural University) · Di ZHANG (Kuaishou Technology) · Zhongyuan Wang (Kuaishou Inc.)
A theory of volumetric representations for opaque solids
Bailey Miller (Carnegie Mellon University) · Hanyu Chen (Carnegie Mellon University) · Alice Lai (Carnegie Mellon University) · Ioannis Gkioulekas (Carnegie Mellon University)
M&M VTO: Multi-Garment Virtual Try-On and Editing
Luyang Zhu (Department of Computer Science, University of Washington) · Yingwei Li (Google) · Nan Liu (Google) · Hao Peng (Google) · Dawei Yang (Google Inc.) · Ira Kemelmacher-Shlizerman (UW + Google)
Generative Powers of Ten
Xiaojuan Wang (Department of Computer Science) · Janne Kontkanen (Research, Google) · Brian Curless (University of Washington) · Steve Seitz (University of Washington) · Ira Kemelmacher-Shlizerman (UW + Google) · Ben Mildenhall (Google) · Pratul P. Srinivasan (Google Research) · Dor Verbin (None) · Aleksander Holynski (UC Berkeley & Google Research)
Total Selfie: Generating Full-Body Selfies
Bowei Chen (University of Washington) · Brian Curless (University of Washington) · Ira Kemelmacher-Shlizerman (UW + Google) · Steve Seitz (University of Washington)
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He (None) · Hengduo Li (Meta AI) · Young Kyun Jang (Meta AI) · Menglin Jia (Facebook) · Xuefei Cao (Meta) · Ashish Shah (Meta) · Abhinav Shrivastava (University of Maryland) · Ser-Nam Lim (Meta AI)
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions
Namitha Padmanabhan (University of Maryland) · Matthew A Gwilliam (University of Maryland, College Park) · Pulkit Kumar (None) · Shishira R Maiya (University of Maryland) · Max Ehrlich (University of Maryland, College Park) · Abhinav Shrivastava (University of Maryland)
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham (University of Maryland, College Park) · Chuong Huynh (University of Maryland, College Park) · Ser-Nam Lim (Meta AI) · Abhinav Shrivastava (University of Maryland)
MaGGIe: Masked Guided Gradual Human Instance Matting
Chuong Huynh (University of Maryland, College Park) · Seoung Wug Oh (Adobe Systems) · Abhinav Shrivastava (University of Maryland) · Joon-Young Lee (Adobe Research)
Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning
Nirat Saini (University of Maryland College Park) · Khoi Pham (University of Maryland, College Park) · Abhinav Shrivastava (University of Maryland)
Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes
Gaurav Shrivastava (Department of Computer Science, University of Maryland, College Park) · Abhinav Shrivastava (University of Maryland)
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation guided by the Characteristic Dance Primitives
Ronghui Li (Tsinghua University) · Yuxiang Zhang (Tsinghua University, Tsinghua University) · Yachao Zhang (Tsinghua University) · Hongwen Zhang (Beijing Normal University) · Jie Guo (Peng Cheng Laboratory) · Yan Zhang (ETH Zurich) · Yebin Liu (Tsinghua University) · Xiu Li (Tsinghua University)
DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields
Cheng-You Lu (University of Technology Sydney) · Peisen Zhou (Brown University) · Angela Xing (Brown University) · Chandradeep Pokhariya (International Institute of Information Technology, Hyderabad, International Institute of Information Technology Hyderabad) · Arnab Dey (I3S-CNRS/Université Côte D'Azur) · Ishaan Shah (International Institute of Information Technology, Hyderabad, International Institute of Information Technology Hyderabad) · Rugved Mavidipalli (Brown University) · Dylan Hu (Brown University) · Andrew Comport (CNRS) · Kefan Chen (Brown University) · Srinath Sridhar (None)
Class Incremental Learning with Multi-Teacher Distillation
Haitao Wen (University of Electronic Science and Technology of China) · Lili Pan (University of Electronic Science and Technology of China) · Yu Dai (University of Electronic Science and Technology of China) · Heqian Qiu (University of Electronic Science and Technology of China) · Lanxiao Wang (University of Electronic Science and Technology of China) · Qingbo Wu (University of Electronic Science and Technology of China) · Hongliang Li (University of Electronic Science and Technology of China, Tsinghua University)
Prompt-Driven Referring Image Segmentation with Instance Contrasting
Chao Shang (None) · Zichen Song (University of Electronic Science and Technology of China) · Heqian Qiu (University of Electronic Science and Technology of China) · Lanxiao Wang (University of Electronic Science and Technology of China) · Fanman Meng (University of Electronic Science and Technology of China) · Hongliang Li (University of Electronic Science and Technology of China, Tsinghua University)
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Niccolò Cavagnero (Polytechnic Institute of Turin) · Gabriele Rosi (Polytechnic Institute of Turin - Focoos AI) · Claudia Cuttano (Polytechnic Institute of Turin) · Francesca Pistilli (Polytechnic Institute of Turin) · Marco Ciccone (Politecnico di Torino) · Giuseppe Averta (Polytechnic of Turin) · Fabio Cermelli (Politecnico di Torino)
ViewDiff: 3D-Consistent Image Generation with Text-To-Image Models
Lukas Hoellein (None) · Aljaž Božič (Facebook) · Norman Müller (Meta) · David Novotny (Facebook) · Hung-Yu Tseng (Meta) · Christian Richardt (Meta Reality Labs) · Michael Zollhoefer (Meta) · Matthias Nießner (Technical University of Munich)
ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing
Jun-Kun Chen (None) · Samuel Rota Bulò (Meta) · Norman Müller (Meta) · Lorenzo Porzi (Facebook) · Peter Kontschieder (Meta) · Yu-Xiong Wang (None)
MultiDiff: Consistent Novel View Synthesis from a Single Image
Norman Müller (Meta) · Katja Schwarz (University of Tuebingen) · Barbara Roessle (Technische Universität München) · Lorenzo Porzi (Facebook) · Samuel Rota Bulò (Meta) · Matthias Nießner (Technical University of Munich) · Peter Kontschieder (Meta)
Cross-view and Cross-pose Completion for 3D Human Understanding
Matthieu Armando (Naver Labs Europe) · Salma Galaaoui (Naver Labs Europe) · Fabien Baradel (NAVER LABS Europe) · Thomas Lucas (Naver Labs Europe) · Vincent Leroy (Naver Labs Europe) · Romain BRÉGIER (None) · Philippe Weinzaepfel (Naver Labs Europe) · Grégory Rogez (Naver Labs Europe)
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Haithem Turki (Carnegie Mellon University) · Vasu Agrawal (Meta Reality Labs Research) · Samuel Rota Bulò (Meta) · Lorenzo Porzi (Facebook) · Peter Kontschieder (Meta) · Deva Ramanan (Carnegie Mellon University) · Michael Zollhoefer (Meta) · Christian Richardt (Meta Reality Labs)
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
ZHIXIANG WEI (University of science and technology of china) · Lin Chen (University of Science and Technology of China) · Xiaoxiao Ma (University of Science and Technology of China) · Huaian Chen (University of Science and Technology of China) · Tianle Liu (University of Science and Technology of China) · Pengyang Ling (University of Science and Technology of China) · Jinjin Zheng (University of Science and Technology of China) · Ben Wang (University of Science and Technology of China) · Yi Jin (University of Science and Technology of China)
Learning Degradation Independent Representations for Camera ISP Pipelines
Yanhui Guo (McMaster University) · Fangzhou Luo (McMaster University) · Xiaolin Wu (McMaster University)
MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections
mude hui (University of California, Santa Cruz) · Zihao Wei (University of Michigan - Ann Arbor) · Hongru Zhu (None) · Fei Xia (Ecole Normale Supérieure de Paris) · Yuyin Zhou (UC Santa Cruz)
On Scaling up a Multilingual Vision and Language Model
Xi Chen (Google) · Josip Djolonga (Google) · Piotr Padlewski (Google) · Basil Mustafa (Google) · Soravit Changpinyo (Google Research) · Jialin Wu (Google) · Carlos Riquelme Ruiz (Google) · Sebastian Goodman (Google) · Xiao Wang (Google DeepMind) · Yi Tay (Google) · Siamak Shakeri (Research, Google) · Mostafa Dehghani (Google DeepMind) · Daniel Salz (Google) · Mario Lučić (Google) · Michael Tschannen (Google DeepMind) · Arsha Nagrani (Google ) · Hexiang Hu (Google Deepmind) · Mandar Joshi (Google DeepMind) · Bo Pang (Google) · Ceslee Montgomery (Google) · Paulina Pietrzyk (Google) · Marvin Ritter (Google DeepMind) · AJ Piergiovanni (Google) · Matthias Minderer (Google) · Filip Pavetic (Google) · Austin Waters (Google) · Gang Li (Google) · Ibrahim Alabdulmohsin (Google) · Lucas Beyer (Google Brain/DM Zürich) · Julien Amelot (Research, Google) · Kenton Lee (Google Research) · Andreas Steiner (Google) · Yang Li (Google) · Daniel Keysers (Google DeepMind) · Anurag Arnab (Google) · Yuanzhong Xu (Google) · Keran Rong (Google Deepmind) · Alexander Kolesnikov (Google) · Mojtaba Seyedhosseini (Google) · Anelia Angelova (Google) · Xiaohua Zhai (Google) · Neil Houlsby (Google) · Radu Soricut (Google)
NeRFiller: Completing Scenes via Generative 3D Inpainting
Ethan Weber (University of California Berkeley) · Aleksander Holynski (UC Berkeley & Google Research) · Varun Jampani (Google Research) · Saurabh Saxena (None) · Noah Snavely (Google / Cornell) · Abhishek Kar (Google) · Angjoo Kanazawa (UC Berkeley)
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Evonne Ng (University of California, Berkeley) · Javier Romero (None) · Timur Bagautdinov (Reality Labs Research) · Shaojie Bai (Meta) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Angjoo Kanazawa (UC Berkeley) · Alexander Richard (Reality Labs Research, Meta)
GARField: Group Anything with Radiance Fields
Chung Min Kim (University of California, Berkeley) · Mingxuan Wu (None) · Justin Kerr (University of California Berkeley) · Ken Goldberg (University of California Berkeley) · Matthew Tancik (Luma AI) · Angjoo Kanazawa (UC Berkeley)
Generative Proxemics: A Prior for 3D Social Interaction from Images
Lea Müller (University of California, Berkeley) · Vickie Ye (University of California, Berkeley) · Georgios Pavlakos (University of Texas at Austin) · Michael J. Black (University of Tübingen) · Angjoo Kanazawa (UC Berkeley)
Reconstructing Hands in 3D with Transformers
Georgios Pavlakos (University of Texas at Austin) · Dandan Shan (None) · Ilija Radosavovic () · Angjoo Kanazawa (UC Berkeley) · David Fouhey (New York University) · Jitendra Malik (University of California at Berkeley)
The More You See in 2D, the More You Perceive in 3D
Xinyang Han (UC Berkeley) · Zelin Gao () · Angjoo Kanazawa (UC Berkeley) · Shubham Goel (Avataar) · Yossi Gandelsman (University of California, Berkeley)
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous and Instruction-guided Driving
Brian Yang (School of Computer Science, Carnegie Mellon University) · Huangyuan Su (Computer Science, School of Engineering and Applied Sciences, Harvard University) · Nikolaos Gkanatsios (Carnegie Mellon University) · Tsung-Wei Ke (CMU, Carnegie Mellon University) · Ayush Jain (Carnegie Mellon University) · Jeff Schneider (Carnegie Mellon University) · Katerina Fragkiadaki (CMU)
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Hao Ouyang (Department of Computer Science and Engineering, Hong Kong University of Science and Technology) · Qiuyu Wang (Ant Group) · Yuxi Xiao (Zhejiang University) · Qingyan Bai (Hong Kong University of Science and Technology) · Juntao Zhang (Hong Kong University of Science and Technology) · Kecheng Zheng (Ant Group) · Xiaowei Zhou (None) · Qifeng Chen (Hong Kong University of Science and Technology) · Yujun Shen (The Chinese University of Hong Kong)
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
Juntao Zhang (Hong Kong University of Science and Technology) · Yuehuai LIU (Hong Kong University of Science and Technology) · Yu-Wing Tai (None) · Chi-Keung Tang (The Hong Kong University of Science and Technology)
ESCAPE: Encoding Super-keypoints for Category-Agnostic Pose Estimation
Khoi D Nguyen (University of Wisconsin - Madison) · Chen Li (National University of Singapore) · Gim Hee Lee (National University of Singapore)
DiffusionLight: Light Probes for Free by Painting a Chrome Ball
Pakkapon Phongthawee (Vidyasirimedhi Institute of Science and Technology) · Worameth Chinchuthakun (Tokyo Institute of Technology) · Nontaphat Sinsunthithet (Vidyasirimedhi Institute of Science and Technology) · Varun Jampani (Google Research) · Amit Raj (Google ) · Pramook Khungurn (Cornell University) · Supasorn Suwajanakorn (Vidyasirimedhi Institute of Science and Technology)
Adversarial Text to Continuous Image Generation
Kilichbek Haydarov (King Abdullah University of Science and Technology) · Aashiq Muhamed (CMU, Carnegie Mellon University) · Xiaoqian Shen (King Abdullah University of Science and Technology) · Jovana Lazarevic (University of Novi Sad) · Ivan Skorokhodov (KAUST) · Chamuditha Jayanga Galappaththige (Queensland University of Technology) · Mohamed Elhoseiny (KAUST)
DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences, Neuron Visualisations, and Visual Counterfactual Explanations
Maximilian Augustin (University of Tuebingen) · Yannic Neuhaus (Eberhard-Karls-Universität Tübingen) · Matthias Hein (University of Tübingen)
Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping
Alex Costanzino (University of Bologna) · Pierluigi Zama Ramirez (University of Bologna) · Giuseppe Lisanti (University of Bologna) · Luigi Di Stefano (University of Bologna)
Neural Visibility Field for Active Mapping
Shangjie Xue (Georgia Institute of Technology) · Jesse Dill (Georgia Institute of Technology) · Pranay Mathur (Georgia Institute of Technology) · Frank Dellaert (Georgia Tech) · Panagiotis Tsiotras (Georgia Institute of Technology) · Danfei Xu (Georgia Institute of Technology)
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen (University of California, Merced) · Aliaksandr Siarohin (Snap Inc.) · Willi Menapace (University of Trento) · Ekaterina Deyneka (Snap Inc.) · Hsiang-wei Chao (Snap Inc.) · Byung Jeon (Snap Inc.) · Yuwei Fang (Snap Inc.) · Hsin-Ying Lee (Snap Inc.) · Jian Ren (Snap Inc.) · Ming-Hsuan Yang (University of California at Merced) · Sergey Tulyakov (Snap Inc.)
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Junyi Zhang (Shanghai Jiao Tong University) · Charles Herrmann (Google) · Junhwa Hur (Google) · Eric Chen (University of Illinois Urbana-Champaign) · Varun Jampani (Google Research) · Deqing Sun (Google) · Ming-Hsuan Yang (University of California at Merced)
VidToMe: Video Token Merging for Zero-Shot Video Editing
Xirui Li (Shanghai Jiaotong University) · Chao Ma (Shanghai Jiao Tong University) · Xiaokang Yang (Shanghai Jiao Tong University, China) · Ming-Hsuan Yang (University of California at Merced)
Multimodal Aerial Visual RECognition (MAVREC) Dataset: Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta (University of Central Florida) · Srijan Das (University of North Carolina at Charlotte) · Jacob Nielsen (University of Southern Denmark - SDU) · RAJATSUBHRA CHAKRABORTY (University of North Carolina at Charlotte) · Mubarak Shah (University of Central Florida)
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
Suraj Patni (Indian Institute of Technology, Delhi) · Aradhye Agarwal (Indian Institute of Technology Delhi) · Chetan Arora (Indian Institute of Technology Delhi)
Convolutional Prompting meets Language Models for Continual Learning
ANURAG Roy (IIT Kharagpur) · Riddhiman Moulick (Indian Institute of Technology Kharagpur) · Vinay Verma Verma (None) · Saptarshi Ghosh (Indian Institute of Technology Kharagpur) · Abir Das (Indian Institute of Technology Kharagpur)
Boosting Neural Representations for Videos with a Conditional Decoder
XINJIE ZHANG (The Hong Kong University of Science and Technology) · Ren Yang (Microsoft Research Asia) · Dailan He (The Chinese University of Hong Kong) · Xingtong Ge (Beijing Institute of Technology) · Tongda Xu (Tsinghua University) · Yan Wang (Tsinghua University, Tsinghua University) · Hongwei Qin (SenseTime Co.) · Jun Zhang (The Hong Kong University of Science and Technology)
Task-Aware Encoder Control for Deep Video Compression
Xingtong Ge (Beijing Institute of Technology) · Jixiang Luo (sensetime) · XINJIE ZHANG (The Hong Kong University of Science and Technology) · Tongda Xu (Tsinghua University) · Guo Lu (Shanghai Jiaotong University) · Dailan He (The Chinese University of Hong Kong) · Jing Geng (Beijing Institute of Technology) · Yan Wang (Tsinghua University, Tsinghua University) · Jun Zhang (The Hong Kong University of Science and Technology) · Hongwei Qin (SenseTime Co.)
Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification
Jiancheng Zhang (Northwestern Polytechnical University Xi'an) · Haijin Zeng (IMEC & Universiteit Gent) · Yongyong Chen (Harbin Institute of Technology (Shenzhen)) · Dengxiu Yu (Northwest Polytechnical University) · Yinping Zhao (Northwestern Polytechnical University)
Dual Prior Unfolding for Snapshot Compressive Imaging
Jiancheng Zhang (Northwestern Polytechnical University Xi'an) · Haijin Zeng (IMEC & Universiteit Gent) · Jiezhang Cao (ETH Zürich) · Yongyong Chen (Harbin Institute of Technology (Shenzhen)) · Dengxiu Yu (Northwest Polytechnical University) · Yinping Zhao (Northwestern Polytechnical University)
Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising
Haijin Zeng (IMEC & Universiteit Gent) · Jiezhang Cao (ETH Zürich) · Yongyong Chen (Harbin Institute of Technology (Shenzhen)) · Kai Zhang (None) · Hiep Luong (Universiteit Gent - IMEC) · Wilfried Philips (Universiteit Gent)
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images
Guanlin Shen (Tsinghua University) · Jingwei Huang (Huawei Technologies Ltd.) · Zhihua Hu (Nanjing University of Information Science and Technology) · Bin Wang (Tsinghua University)
OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
Noor Ahmed (Saarland Informatics Campus, Max-Planck Institute) · Anna Kukleva (MPII) · Bernt Schiele (Max Planck Institute for Informatics)
$\textbf{LaRE}^2$: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
Yunpeng Luo (Tencent Youtu Lab) · Junlong Du (Tencent YouTu Lab) · Ke Yan () · Shouhong Ding (Tencent Youtu Lab)
Modality-Collaborative Test-Time Adaptation for Action Recognition
Baochen Xiong (Institute of Automation, Chinese Academy of Sciences; Peng Cheng Lab) · Xiaoshan Yang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Yaguang Song (Peng Cheng Laboratory) · Yaowei Wang (Pengcheng Laboratory) · Changsheng Xu (None)
From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze Estimation
Yiwei Bao (Beihang University) · Feng Lu (Beihang University, Tsinghua University)
READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning
Takeru Oba (None) · Matthew Walter (Toyota Technological Institute at Chicago) · Norimichi Ukita (Toyota Technological Institute)
Learning Group Activity Features Through Person Attribute Prediction
Chihiro Nakatani (TTI-J) · Hiroaki Kawashima (University of Hyogo) · Norimichi Ukita (Toyota Technological Institute)
Rotation-Agnostic Image Representation Learning for Digital Pathology
Saghir Alfasly (Mayo Clinic) · Abubakr Shafique (Mayo Clinic) · Peyman Nejat (Mayo Clinic) · Jibran Khan (Luther College) · Areej Alsaafin (Mayo Clinic) · Ghazal Alabtah (Mayo Clinic) · Hamid Tizhoosh (None)
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Fei Deng (Rutgers University Google) · Qifei Wang (Google) · Wei Wei (Google) · Tingbo Hou (Google Research) · Matthias Grundmann (Google)
One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning
Pei-Kai Huang (Department of Computer Science, National Tsing Hua University) · Cheng-Hsuan Chiang (National Tsinghua University) · Tzu-Hsien Chen (National Tsinghua University) · Jun-Xiong Chong (National Tsing Hua University) · Tyng-Luh Liu (IIS/Academia Sinica) · Chiou-Ting Hsu (National Tsing Hua University)
MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
He Zhang (Beihang University) · Shenghao Ren (Nanjing University) · Haolei Yuan (Beijing University of Aeronautics and Astronautics) · Jianhui Zhao (Beijing University of Aeronautics and Astronautics) · Fan Li (Beijing University of Aeronautics and Astronautics) · Shuangpeng Sun (Tsinghua University, Tsinghua University) · Zhenghao Liang (Tsinghua University, Tsinghua University) · Tao Yu (Tsinghua University, Tsinghua University) · Qiu Shen (Nanjing University) · Xun Cao (Nanjing University)
JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients
Woo Kyoung Han (Korea University) · Sunghoon Im (DGIST) · Jaedeok Kim (NVIDIA) · Kyong Hwan Jin (Korea University)
Towards Accurate and Robust Architectures via Neural Architecture Search
Yuwei Ou (Sichuan University) · Yuqi Feng (Sichuan University) · Yanan Sun (Sichuan University)
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li (Nankai University & Tencent) · Mingdeng Cao (The University of Tokyo) · Xintao Wang (Tencent) · Zhongang Qi (Tencent PCG ARC Lab) · Ming-Ming Cheng (Nankai University, Tsinghua University) · Ying Shan (Tencent)
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im (NAVER Cloud) · JeongYeon Nam (Naver Cloud) · Nokyung Park (NAVER) · Hyungmin Lee (NAVER) · Seunghyun Park (NAVER Cloud)
Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection
Jongha Kim (Korea University) · Jihwan Park (Korea University) · Jinyoung Park (Korea University) · Jinyoung Kim (Korea University) · Sehyung Kim (Korea University) · Hyunwoo J. Kim (Korea University)
CPR-Coach: Recognizing Composite Error Actions based on Single-class Training
Shunli Wang (Fudan University) · Shuaibing Wang (Fudan University) · Dingkang Yang (Fudan University) · Mingcheng Li (Fudan University) · Haopeng Kuang (Fudan University) · Xiao Zhao (None) · Liuzhen Su (Fudan University) · Peng Zhai (Fudan University) · Lihua Zhang (Fudan University)
Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities
Mingcheng Li (Fudan University) · Dingkang Yang (Fudan University) · Xiao Zhao (None) · Shuaibing Wang (Fudan University) · Yan Wang (Fudan University) · Kun Yang (Fudan University) · Mingyang Sun (Fudan University) · Dongliang Kou (Academy for Engineering and Technology, Fudan University, Shanghai, China.) · Qian (Fudan University) · Lihua Zhang (Fudan University)
Retrieval-Augmented Open-Vocabulary Object Detection
Jooyeon Kim (Korea University) · Eulrang Cho (Samsung Research) · Sehyung Kim (Korea University) · Hyunwoo J. Kim (Korea University)
Language-conditioned Detection Transformer
Jang Hyun Cho (University of Texas, Austin) · Philipp Krähenbühl (University of Texas at Austin)
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li (Peking University) · Mingxu Zhang (Beijing University of Posts and Telecommunications) · Yiran Geng (Peking University) · Haoran Geng (Peking University) · Yuxing Long (Beijing University of Posts and Telecommunications) · Yan Shen (Peking University) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Jiaming Liu (Peking University) · Hao Dong (None)
A Simple Recipe for Language-guided Domain Generalized Segmentation
Mohammad Fahes (INRIA) · TUAN-HUNG VU (None) · Andrei Bursuc (valeo.ai) · Patrick Pérez (None) · Raoul de Charette (Inria)
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim (Kyung Hee University) · Hyeon Bae Kim (Kyung Hee University) · Jinyoung Moon (ETRI) · Jinwoo Choi (Kyung Hee University) · Seong Tae Kim (Kyung Hee University)
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu (Peking University) · Ran Xu (Beijing University of Posts and Telecommunications) · Senqiao Yang (Harbin Institute of Technology) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Qizhe Zhang (Peking University) · Zehui Chen (University of Science and Technology of China) · Yandong Guo (OPPO Research Institute) · Shanghang Zhang (Peking University)
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
Xiaobao Wei (University of the Chinese Academy of Sciences) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Jiarui Wu (Beijing University of Aeronautics and Astronautics) · Jiaming Liu (Peking University) · Ming Lu (Intel Labs China) · Yandong Guo (OPPO Research Institute) · Shanghang Zhang (Peking University)
Towards Progressive Multi-Frequency Representation for Image Warping
Jun Xiao (The Hong Kong Polytechnic University) · Zihang Lyu (The Hong Kong Polytechnic University) · Cong Zhang (Hong Kong Polytechnic University) · Yakun Ju (Nanyang Technological University) · Changjian Shui (Vector Institute) · Kin-man Lam (The Hong Kong Polytechnic University)
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
Andrea Rosasco (Università degli Studi di Genova, Istituto Italiano di Tecnologia) · Stefano Berti (Università degli Studi di Genova, Istituto Italiano di Tecnologia) · Giulia Pasquale (Istituto Italiano di Tecnologia) · Damiano Malafronte (Istituto Italiano di Tecnologia) · Shogo Sato (Sony Interactive Entertainment Inc.) · Hiroyuki Segawa (Sony Interactive Entertainment) · Tetsugo Inada (Sony Interactive Entertainment) · Lorenzo Natale (Istituto Italiano di Tecnologia)
Identifying Important Group of Pixels using Interactions
Kosuke Sumiyasu (Chiba University) · Kazuhiko Kawamoto (Chiba University) · Hiroshi Kera (Chiba University)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury (None) · Sayan Nag (University of Toronto) · Joseph K J (Adobe Research) · Balaji Vasan Srinivasan (Adobe Research) · Dinesh Manocha (University of Maryland, College Park)
Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Ivan Rodin (University of Catania) · Antonino Furnari (University of Catania) · Kyle Min (Intel Labs) · Subarna Tripathi (Intel Corporation) · Giovanni Maria Farinella (University of Catania, Italy)
Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling
Kranthi Kumar Rachavarapu (Indian Institute of Technology Madras) · Kalyan Ramakrishnan (University of Oxford) · A. N. Rajagopalan (Indian Institute of Technology Madras)
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li (SIST ,ShanghaiTech University) · Songyang Zhang (Shanghai AI Laboratory) · Dahua Lin (The Chinese University of Hong Kong) · Kai Chen (Shanghai AI Laboratory) · Xuming He (ShanghaiTech University)
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
Rongjie Li (SIST ,ShanghaiTech University) · Yu Wu (ShanghaiTech University) · Xuming He (ShanghaiTech University)
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
Zeeshan Hayder (CSIRO) · Xuming He (ShanghaiTech University)
Point Cloud Pre-training with Diffusion Models
xiao zheng (None) · Xiaoshui Huang (Shanghai AI Laboratory) · Guofeng Mei (Fondazione Bruno Kessler) · Zhaoyang Lyu (Shanghai AI Laboratory) · Yuenan Hou (Shanghai AI Laboratory) · Wanli Ouyang (University of Sydney) · Bo Dai (Shanghai AI Laboratory) · Yongshun Gong (Shandong University)
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
Zhi Gao (Peking University) · Yuntao Du. (Nanjing University) · Xintong Zhang (Beijing Institute for General Artificial Intelligence) · Xiaojian Ma (University of California, Los Angeles) · Wenjuan Han (Beijing Jiaotong University) · Song-Chun Zhu (UCLA) · Qing Li (Beijing Institute for General Artificial Intelligence (BIGAI))
Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering
Suyuan Liu (National University of Defense Technology) · KE LIANG (National University of Defense Technology) · Zhibin Dong (National University of Defense Technology) · Siwei Wang (Academy of Military Sciences) · Xihong Yang (National University of Defense Technology) · sihang zhou (National University of Defense Technology) · En Zhu (National University of Defense Technology) · Xinwang Liu (National University of Defense Technology)
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
Xiaoyang Wu (The University of Hong Kong) · Zhuotao Tian (The Chinese University of Hong Kong) · Xin Wen (The University of Hong Kong) · Bohao Peng (The Chinese University of Hong Kong) · Xihui Liu (The University of Hong Kong) · Kaicheng Yu (Alibaba Group) · Hengshuang Zhao (The University of Hong Kong)
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
Wenqian Zhang (ShanghaiTech University) · Molin Huang (Shanghaitech University) · Yuxuan Zhou (None) · Juze Zhang (ShanghaiTech University) · Jingyi Yu (ShanghaiTech University) · Jingya Wang (ShanghaiTech University) · Lan Xu (ShanghaiTech University)
OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning
Geng Xinyu (None) · Jiaming Wang (Harbin Institute of Technology) · Jiawei Gong (Harbin Institute of Technology) · yuerong xue (Harbin Institute of Technology) · Jun Xu (Harbin Institute of Technology) · Fanglin Chen (Harbin Institute of Technology (Shenzhen)) · Xiaolin Huang (Shanghai Jiao Tong University, Tsinghua University)
Unbiased Estimator for Distorted Conic in Camera Calibration
Chaehyeon Song (Seoul National University) · Jaeho Shin (Seoul National University) · Myung-Hwan Jeon (Seoul National University) · Jongwoo Lim (Seoul National University) · Ayoung Kim (Seoul National University)
Posterior Distillation Sampling
Juil Koo (KAIST) · Chanho Park (KAIST) · Minhyuk Sung (KAIST)
A Vision Check-up for Language Models
Pratyusha Sharma (Massachusetts Institute of Technology) · Tamar Rott Shaham (MIT) · Manel Baradad (Massachusetts Institute of Technology) · Stephanie Fu (University of California, Berkeley) · Adrian Rodriguez-Munoz (Massachusetts Institute of Technology) · Shivam Duggal (Massachusetts Institute of Technology) · Phillip Isola (None) · Antonio Torralba (MIT)
State Space Models for Event Cameras
Nikola Zubic (Robotics and Perception Group, University of Zurich and ETH Zurich) · Mathias Gehrig (University of Zurich) · Davide Scaramuzza (University of Zurich)
Weak-to-Strong 3D Object Detection with X-Ray Distillation
Alexander Gambashidze (AIRI) · Aleksandr Dadukin (Higher School of Economics) · Maksim Golyadkin (AIRI) · Maria Razzhivina (Higher School of Economics, Higher School of Economics) · Ilya Makarov (Moscow State Institute of Steel and Alloys)
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
Anh-Quan Cao (INRIA) · Angela Dai () · Raoul de Charette (Inria)
Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai (Department of Computer Science, University of Wisconsin, Madison) · Haotian Liu (University of Wisconsin-Madison) · Siva Mustikovela (Heidelberg University) · Gregory P. Meyer (Cruise) · Yuning Chai (Cruise) · Dennis Park (Toyota Research Institute) · Yong Jae Lee (Department of Computer Sciences, University of Wisconsin - Madison)
Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration
Hong Chen (Huazhong University of Science and Technology) · Pei Yan (Huazhong University of Science and Technology) · sihe xiang (None) · Yihua Tan (Huazhong University of Science and Technology)
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen (University of Texas at Austin) · Kumar Ashutosh (UT Austin & FAIR, Meta) · Rohit Girdhar (Meta) · David Harwath (University of Texas, Austin) · Kristen Grauman (University of Texas at Austin)
Describing Differences in Image Sets with Natural Language
Lisa Dunlap (University of California, Berkeley) · Yuhui Zhang (Stanford University) · Xiaohan Wang (Stanford University) · Ruiqi Zhong (University of California Berkeley) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Jacob Steinhardt (University of California Berkeley) · Joseph Gonzalez (University of California - Berkeley) · Serena Yeung (Stanford)
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Kraus (Recursion) · Kian Kenyon-Dean (Recursion Pharma) · Saber Saberian (Recursion Pharma) · Maryam Fallah (Recursion Pharmaceuticals) · Peter McLean (Recursion) · Jess Leung (Recursion) · Vasudev Sharma (Recursion) · Ayla Khan (University of Utah) · Jia Balakrishnan (Recursion Pharmaceuticals) · Safiye Celik (Recursion) · Dominique Beaini (Valence Labs) · Maciej Sypetkowski (Valence Labs) · Chi Cheng (Boston University, Boston University) · Kristen Morse (Recursion) · Maureen Makes (University of Utah) · Ben Mabey (None) · Berton Earnshaw (University of Utah)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang (Northwestern University) · Jong-Chyi Su (None) · Samuel Schulter (NEC Laboratories America) · Sparsh Garg (NEC Laboratories America) · Shiyu Zhao (Rutgers University, New Brunswick) · Ying Wu (Northwestern University) · Manmohan Chandraker (UC San Diego)
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
Jaskirat Singh (Australian National University) · Jianming Zhang (Adobe Systems) · Qing Liu (Adobe Systems) · Cameron Smith (Adobe Systems) · Zhe Lin (Adobe Research) · Liang Zheng (Australian National University)
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Hongxia Xie (Jilin University) · Chu-Jun Peng (National Yang Ming Chiao Tung University) · Yu-Wen Tseng (Department of computer science and informational engineering, National Taiwan University) · Hung-Jen Chen (National Yang Ming Chiao Tung University) · Chan-Feng Hsu (National Chiao Tung University) · Hong-Han Shuai (National Yang Ming Chiao Tung University) · Wen-Huang Cheng (National Taiwan University)
GenZI: Zero-Shot 3D Human-Scene Interaction Generation
Lei Li (Technical University of Munich) · Angela Dai ()
Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation
Ba Hung Ngo (Chonnam National University) · Nhat-Tuong Do-Tran (National Yang Ming Chiao Tung University) · Tuan-Ngoc Nguyen (FPT Telecom) · Hae-Gon Jeon (GIST) · Tae Jong Choi (Chonnam National University)
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
Peifei Zhu (LY Corporation) · Tsubasa Takahashi (LY Corporation) · Hirokatsu Kataoka (LY Corporation)
View From Above: Orthogonal viewpoint aware Cross-view Localization
Shan Wang (ANU;CSIRO) · Chuong Nguyen (None) · Jiawei Liu (Australian National University) · Yanhao Zhang (University of Technology Sydney) · Sundaram Muthu (, CSIRO) · Fahira Afzal Maken (CSIRO) · Kaihao Zhang (Australian National University) · Hongdong Li (Australian National University)
Self-Calibrating Vicinal Risk Minimisation for Model Calibration
Jiawei Liu (Australian National University) · Changkun Ye (Australian National University) · Ruikai Cui (Australian National University) · Nick Barnes (Australian National University)
CAT-Seg: Cost Aggregation for Open-vocabulary Semantic Segmentation
Seokju Cho (Korea University) · Heeseong Shin (Korea University) · Sunghwan Hong (Korea University) · Anurag Arnab (Google) · Paul Hongsuck Seo (Google) · Seungryong Kim (Korea University)
Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching
Rui Gong (Nanyang Technological University) · Weide Liu (Harvard University) · ZAIWANG GU (None) · Xulei Yang (Institute for Infocomm Research (I2R), A*STAR) · Jun Cheng (Institute For Infocomm Research, A*STAR)
Time-, Memory- and Parameter-Efficient Visual Adaptation
Otniel-Bogdan Mercea (University of Tübingen) · Alexey Gritsenko (Google) · Cordelia Schmid (Inria / Google) · Anurag Arnab (Google)
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Chenshuang Zhang (Korea Advanced Institute of Science and Technology) · Fei Pan (University of Michigan - Ann Arbor) · Junmo Kim (Korea Advanced Institute of Science and Technology) · In So Kweon (Korea Advanced Institute of Science and Technology) · Chengzhi Mao (Columbia University)
A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
Julio Silva-Rodríguez (ETS Montreal) · Sina Hajimiri (École de technologie supérieure, Université du Québec) · Ismail Ben Ayed (ETS Montreal) · Jose Dolz (École de technologie supérieure)
StegoGAN: Bootstrapping Non-bijective Image-to-Image Translation with CycleGAN Steganography
Sidi Wu (ETH Zurich) · Yizi Chen (ETHZ - ETH Zurich) · Loic Landrieu (ENPC, IGN) · Nicolas Gonthier (IGN) · Samuel Mermet (Ecole Nationale des Sciences Géographiques) · Lorenz Hurni (ETHZ - ETH Zurich) · Konrad Schindler (ETH Zurich)
Neural Lineage
Runpeng Yu (National University of Singapore) · Xinchao Wang (National University of Singapore)
Weakly Supervised Video Individual Counting
Xinyan Liu (None) · Guorong Li (University of Chinese Academy of Sciences) · Yuankai Qi (The University of Adelaide) · Ziheng Yan (University of Chinese Academy of Sciences) · Zhenjun Han (University of the Chinese Academy of Sciences) · Anton van den Hengel (University of Adelaide) · Ming-Hsuan Yang (University of California at Merced) · Qingming Huang (University of Chinese Academy of Sciences)
Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation
Qiyuan Dai (ShanghaiTech University) · Sibei Yang (None)
Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models
Kota Sueyoshi (Osaka University) · Takashi Matsubara (Hokkaido Universiry)
MICap: A Unified Model for Identity-aware Movie Descriptions
Haran Raajesh (International Institute of Information Technology, Hyderabad, International Institute of Information Technology Hyderabad) · Naveen Reddy Desanur (International Institute of Information Technology Hyderabad) · Zeeshan Khan (INRIA) · Makarand Tapaswi (IIIT Hyderabad, Wadhwani AI)
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Ji-Jia Wu (National Taiwan University) · Andy Chia-Hao Chang (National Yang Ming Chiao Tung University) · Chieh-Yu Chuang (National Yang Ming Chiao Tung University) · Chun-Pei Chen (National Yang Ming Chiao Tung University) · Yu-Lun Liu (National Yang Ming Chiao Tung University) · Min-Hung Chen (NVIDIA) · Hou-Ning Hu (MediaTek Inc.) · Yung-Yu Chuang (National Taiwan University) · Yen-Yu Lin (National Yang Ming Chiao Tung University)
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
Trong-Thuan Nguyen (University of Arkansas) · Pha Nguyen (University of Arkansas) · Khoa Luu (University of Arkansas)
MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling
Xuzhe Zhang (Columbia University) · Yuhao Wu (Duke University) · Elsa Angelini (Télécom ParisTech) · Ang Li (University of Maryland, College Park) · Jia Guo (Columbia University) · Jerod Rasmussen (University of California, Irvine) · Thomas O'Connor (University of Rochester) · Pathik Wadhwa (University of California, Irvine) · Andrea Jackowski (None) · Hai Li (Duke University) · Jonathan Posner (Duke University) · Andrew Laine (Columbia University) · Yun Wang (Emory University)
PREGO: online mistake detection in PRocedural EGOcentric videos
Alessandro Flaborea (Sapienza University of Rome / ItalAI) · Guido M. D'Amely di Melendugno (University of Roma "La Sapienza") · Leonardo Plini (Sapienza University of Rome & INFN) · Luca Scofano (University of Roma "La Sapienza") · Edoardo De Matteis (Sapienza University) · Antonino Furnari (University of Catania) · Giovanni Maria Farinella (University of Catania, Italy) · Fabio Galasso (None)
Masked and Shuffled Blind Spot Denoising for Real-World Images
Hamadi Chihaoui (University of Bern) · Paolo Favaro (Institute für Informatik, University of Bern)
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos
Chen Liu (The University of Queensland) · Peike Li (Futureverse AI) · Qingtao Yu (Australian National University) · Hongwei Sheng (University of Queensland) · Dadong Wang (CSIRO) · Lincheng Li () · Xin Yu (University of Queensland)
Efficient Privacy-Preserving Visual Localization Using 3D Ray Clouds
Heejoon Moon (HANYANG university) · Chunghwan Lee (Hanyang University) · Je Hyeong Hong (Hanyang University)
Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model
Runmin Dong (Tsinghua University) · Shuai Yuan (The University of Hong Kong) · Bin Luo (Tsinghua University) · Mengxuan Chen (Tsinghua University) · Jinxiao Zhang (Tsinghua University) · Lixian Zhang (National Supercomputing Center in Shenzhen) · Weijia Li (Sun Yat-sen University) · Juepeng Zheng (Sun Yat-Sen University) · Haohuan Fu (Tsinghua University, Tsinghua University)
LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding
Min Liang (University of Science and Technology Beijing) · Jia-Wei Ma (University of Science and Technology Beijing) · Xiaobin Zhu (University of Science and Technology Beijing) · Jingyan Qin (University of Science and Technology Beijing) · Xu-Cheng Yin (University of Science and Technology Beijing)
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
Mengqi Zhang (University of California, San Diego) · Yang Fu (University of California San Diego) · Zheng Ding (University of California, San Diego) · Sifei Liu (NVIDIA) · Zhuowen Tu (University of California, San Diego) · Xiaolong Wang (UCSD)
Restoration by Generation with Constrained Priors
Zheng Ding (University of California, San Diego) · Xuaner Zhang (Adobe) · Zhuowen Tu (University of California, San Diego) · Zhihao Xia (Adobe Systems)
Low-Latency Neural Stereo Streaming
Qiqi Hou (Qualcomm Inc, QualComm) · Farzad Farhadzadeh (Qualcomm Inc, QualComm) · Amir Said (Qualcomm Inc, QualComm) · Guillaume Sautiere (Qualcomm Inc, QualComm) · Hoang Le (Qualcomm AI Research)
SODA: Bottleneck Diffusion Models for Representation Learning
Drew Hudson (Google DeepMind) · Daniel Zoran (DeepMind) · Mateusz Malinowski (MoonValley AI) · Andrew Lampinen (Google DeepMind) · Andrew Jaegle (Google DeepMind) · James McClelland (Stanford University and Google DeepMind) · Loic Matthey (DeepMind) · Felix Hill (Google) · Alexander Lerchner (Google DeepMind)
Flow-Guided Online Stereo Rectification for Wide Baseline Stereo
Anush Kumar (Torc Robotics) · Fahim Mannan () · Omid Hosseini Jafari (Torc Robotics) · Shile Li (Torc Robotics) · Felix Heide (Department of Computer Science, Princeton University)
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Ioanna Ntinou (Queen Mary University of London) · Enrique Sanchez (Samsung AI Center Cambridge) · Georgios Tzimiropoulos (Queen Mary University London)
Language-driven Grasp Detection
An Dinh Vuong (FPT Software - AI Center) · Minh Nhat VU (ACIN Institute, TU Wien/ Austrian Institute of Technology) · Baoru Huang (University College London, University of London) · Nghia Nguyen (FPT Software) · Hieu Le (FPT Software AI Center) · Thieu Vo (Ton Duc Thang University) · Anh Nguyen (University of Liverpool)
YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection
Alon Zolfi (Ben-Gurion University of the Negev) · Guy AmiT (Ben-Gurion University of the Negev) · Amit Baras () · Satoru Koda (Fujitsu Limited) · Ikuya Morikawa (Fujitsu Research) · Yuval Elovici (Ben Gurion University of the Negev) · Asaf Shabtai (Ben-Gurion University of the Negev)
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Pingping Zhang (Dalian University of Technology) · Yuhao Wang (Dalian University of Technology) · Yang Liu (Dalian University of Technology) · Zhengzheng Tu (Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University) · Huchuan Lu (Dalian University of Technology)
Revisiting Counterfactual Problems in Referring Expression Comprehension
Zhihan Yu (Beijing University of Posts and Telecommunications) · Ruifan Li (Beijing University of Post and Telecommunication)
Towards Robust Learning to Optimize with Theoretical Guarantees
Qingyu Song (The Chinese University of Hong Kong) · Wei Lin (The Chinese University of Hong Kong) · Juncheng Wang (Hong Kong Baptist University) · Hong Xu (CUHK)
JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments
Duy Tho Le (Monash University) · Chenhui Gou (Monash University) · Stavya Datta (Monash University) · Hengcan Shi (None) · Ian Reid (University of Adelaide) · Jianfei Cai (Monash University) · Hamid Rezatofighi (Monash University)
Boosting Image Restoration via Priors from Pre-trained Models
Xiaogang Xu (Zhejiang Lab) · Shu Kong (University of Macau, Texas A&M University) · Tao Hu (National University of Singapore) · Zhe Liu (Zhejiang Lab) · Hujun Bao (Zhejiang University)
Unsupervised 3D Structure Inference from Category-Specific Image Collections
Weikang Wang (Rheinische Friedrich-Wilhelms Universität Bonn) · Dongliang Cao (University of Bonn) · Florian Bernard (University of Bonn)
3D Neural Edge Reconstruction
Lei Li (ETH Zurich) · Songyou Peng (ETH Zurich & MPI Tübingen) · Zehao Yu (None) · Shaohui Liu (ETH Zurich) · Rémi Pautrat (Microsoft Mixed Reality & AI lab) · Xiaochuan Yin (Utopilot) · Marc Pollefeys (ETH Zurich / Microsoft)
CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing
Ajian Liu (NLPR, CASIA) · Shuai Xue (Beijing Institute of Technology) · Gan Jianwen (Macao University of Science and Techonology) · Jun Wan () · Yanyan Liang (Macau University of Science and Technology) · Jiankang Deng (Imperial College London & Huawei UKRD) · Sergio Escalera (Computer Vision Center) · Zhen Lei (Institute of Automation, Chinese Academy of Sciences)
VOODOO 3D: VOlumetric pOrtrait Disentanglement fOr Online 3D head reenactment
Phong Tran (MBZUAI) · Egor Zakharov (ETH Zurich) · Long Nhat Ho (Mohamed bin Zayed University of Artificial Intelligence) · Anh Tran (VinAI Research) · Liwen Hu (Pinscreen) · Hao Li (Mohamed bin Zayed University of Artificial Intelligence)
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
Haonan Lin (Xi'an Jiaotong University)
MMA-Diffusion: MultiModal Attack on Diffusion Models
Yijun Yang (The Chinese University of Hong Kong) · Ruiyuan Gao (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Xiaosen Wang (Huazhong University of Science and Technology) · Tsung-Yi Ho (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Xu Nan (Institute of Automation, Chinese Academy of Sciences) · Qiang Xu (The Chinese University of Hong Kong)
Online Task-Free Continual Generative and Discriminative Learning via Dynamic Cluster Memory
飞 叶 (University of York) · Adrian Bors (MBZUAI)
Named Entity Driven Zero-Shot Image Manipulation
Zhida Feng (Wuhan University of Science and Technology) · Li Chen (Wuhan University of Science and Technology) · Jing Tian (National University of Singapore) · Jiaxiang Liu (Baidu) · Shikun Feng (Baidu)
Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth
Zhaoyang Sun (Wuhan University of Technology) · Shengwu Xiong (Wuhan University of Technology) · Yaxiong Chen (Wuhan University of Technology) · Yi Rong (Wuhan University of Technology)
Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
David Stotko (University of Bonn) · Nils Wandel (University of Bonn) · Reinhard Klein (University of Bonn)
SIRA: Scalable Inter-frame Relation and Association for Radar Perception
Ryoma Yataka (Mitsubishi Electric Research Laboratories (MERL)) · Pu (Perry) Wang (None) · Petros Boufounos (Mitsubishi Electric Research Laboratories) · Ryuhei Takahashi (Mitsubishi Electric Corporation)
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li (华中科技大学) · Laurence Yang (Hainan University) · Bocheng Ren (None) · Xin Nie (Huazhong University of Science and Technology) · Zhangyang Gao (Westlake University, China) · Cheng Tan (Zhejiang University & Westlake University) · Stan Z. Li (Westlake University)
General Point Model Pretraining with Autoencoding and Autoregressive
Zhe Li (华中科技大学) · Zhangyang Gao (Westlake University, China) · Cheng Tan (Zhejiang University & Westlake University) · Bocheng Ren (None) · Laurence Yang (Hainan University) · Stan Z. Li (Westlake University)
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
Zhixuan Liu (Carnegie Mellon University) · Peter Schaldenbrand (CMU, Carnegie Mellon University) · Beverley-Claire Okogwu (CMU, Carnegie Mellon University) · Wenxuan Peng (Nanyang Technological University) · Youngsik Yun (Dongguk University) · Andrew Hundt (Carnegie Mellon University) · Jihie Kim (Dongguk University) · Jean Oh (Carnegie Mellon University)
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
Chull Hwan Song (Dealicious Inc) · Taebaek Hwang (None) · Jooyoung Yoon (Dealicious Inc) · Shunghyun Choi (Dealicious Inc.) · Yeong Hyeon Gu (Sejong University)
You Only Need Less Attention Each Stage in Vision Transformers
Shuoxi Zhang (Huazhong University of Science and Technology) · Hanpeng Liu (Huazhong University of Science and Technology) · Stephen Lin (Microsoft Research) · Kun He (Huazhong University of Sceince and Technology)
MoST: Multi-modality Scene Tokenization for Motion Prediction
Norman Mu (UC Berkeley) · Jingwei Ji (Waymo LLC) · Zhenpei Yang (Waymo LLC) · Nathan Harada (Waymo LLC) · Haotian Tang (Massachusetts Institute of Technology) · Kan Chen (Waymo) · Charles R. Qi (Waymo) · Runzhou Ge (Waymo) · Kratarth Goel (Waymo) · Zoey Yang (Waymo) · Scott Ettinger (Waymo LLC) · Rami Al-Rfou (Waymo) · Dragomir Anguelov (Waymo) · Yin Zhou (Waymo)
Object Dynamics Modeling with Hierarchical Point Cloud-based Representations
Chanho Kim (Oregon State University) · Li Fuxin (Oregon State University)
Distilling Vision-Language Models on Millions of Videos
Yue Zhao (UT Austin) · Long Zhao (Google Research) · Xingyi Zhou (Google) · Jialin Wu (Google) · Chun-Te Chu (Google Research) · Hui Miao (Google) · Florian Schroff (Google) · Hartwig Adam (Google Research) · Ting Liu (Google Research) · Boqing Gong (Google) · Philipp Krähenbühl (University of Texas at Austin) · Liangzhe Yuan (Google)
Structure-from-Motion from Pixel-wise Correspondences
Philipp Lindenberger (Department of Computer Science, ETHZ - ETH Zurich) · Paul-Edouard Sarlin (ETH Zurich) · Marc Pollefeys (ETH Zurich / Microsoft)
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
Linqi Zhou (Stanford University) · Andy Shih (Stanford University) · Chenlin Meng (None) · Stefano Ermon (Stanford University)
Improved Visual Grounding through Self-Consistent Explanations
Ruozhen He (Rice University) · Paola Cascante-Bonilla (Rice University) · Ziyan Yang (Rice University) · Alex Berg (None) · Vicente Ordonez (Rice University)
Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation
Yuan Xiao (Nanjing University) · Shiqing Ma (University of Massachusetts at Amherst) · Juan Zhai (University of Massachusetts at Amherst) · Chunrong Fang (Nanjing University) · Jinyuan Jia (Pennsylvania State University) · Zhenyu Chen (nanjing university)
Efficient Detection of Long Consistent Cycles and its Application to Distributed Synchronization
Shaohan Li (University of Minnesota, Minneapolis) · Yunpeng Shi (University of California, Davis) · Gilad Lerman (University of Minnesota, Minneapolis)
UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization
Shuaibo Li (Beijing University of Technology & Institute of Automation, Chinese Academy of Sciences) · Wei Ma (Beijing University of Technology) · Jianwei Guo (Institute of Automation, Chinese Academy of Sciences) · Shibiao Xu (Beijing University of Posts and Telecommunications) · Benchong Li (Beijing University of Technology) · Xiaopeng Zhang (Institute of Automation, Chinese Academy of Sciences)
Learning to Control Camera Exposure via Reinforcement Learning
Kyunghyun Lee (LG AI Research) · Ukcheol Shin (Carnegie Mellon University (CMU)) · Byeong-Uk Lee (KRAFTON, Inc.)
Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion
Su Sun (Purdue University) · Henry Zhao (Bosch Research) · Yuliang Guo (Bosch US Research) · Ruoyu Wang (Bosch) · Xinyu Huang (Robert Bosch Research NA) · Yingjie Victor Chen (Purdue University) · Liu Ren (Bosch Research)
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
Yuxuan Zhou (University of Mannheim) · Xudong Yan (City University of Macau) · Zhi-Qi Cheng (Carnegie Mellon University) · Yan Yan (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Qi Dai (Microsoft Research Asia) · Xian-Sheng Hua (Terminus Group)
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma (Purdue University) · Can Cui (Purdue University) · Xu Cao (University of Illinois Urbana-Champaign) · Wenqian Ye (University of Virginia) · Peiran Liu (Purdue University) · Juanwu Lu (Purdue University) · Amr Abdelraouf (None) · Rohit Gupta (Toyota Motor Corporation) · Kyungtae Han (Toyota Motor North America) · Aniket Bera (Purdue University) · James Rehg (None) · Ziran Wang (Purdue University)
MAPLM: A Real-World Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding
Xu Cao (University of Illinois Urbana-Champaign) · Tong Zhou (Tencent AI Lab) · Yunsheng Ma (Purdue University) · Wenqian Ye (University of Virginia) · Can Cui (Purdue University) · Kun Tang (Tencent) · Zhipeng Cao (Tencent) · Kaizhao Liang (University of Texas at Austin) · Ziran Wang (Purdue University) · James Rehg (None) · chao zheng (tencent)
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
Razvan Pasca (None) · Alexey Gavryushin (ETHZ - ETH Zurich) · Muhammad Hamza (Department of Informatics, University of Zurich, University of Zurich) · Yen-Ling Kuo (University of Virginia, Charlottesville) · Kaichun Mo (NVIDIA Research) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Otmar Hilliges (None) · Xi Wang (None)
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Zhan Li (OPPO US Research Center & Portland State University) · Zhang Chen (OPPO US Research Center) · Zhong Li (InnoPeak Technology) · Yi Xu (OPPO US Research Center)
Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
Xiaoyang Chen (University of Pennsylvania, University of Pennsylvania) · Hao Zheng (University of Pennsylvania, University of Pennsylvania) · Yuemeng LI (University of Pennsylvania) · Yuncong Ma (University of Pennsylvania, University of Pennsylvania) · Liang Ma (University of Pennsylvania, University of Pennsylvania) · Hongming Li (University of Pennsylvania, University of Pennsylvania) · Yong Fan (University of Pennsylvania)
Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture
Juanwu Lu (Purdue University) · Can Cui (Purdue University) · Yunsheng Ma (Purdue University) · Aniket Bera (Purdue University) · Ziran Wang (Purdue University)
Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation
Jihyun Kim (Yonsei University, LG Electronics) · Changjae Oh (Queen Mary University London) · Hoseok Do (LG Electronics) · Soohyun Kim (Korea University) · Kwanghoon Sohn (Yonsei University)
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Myeongseob Ko (Virginia Polytechnic Institute and State University) · Feiyang Kang (Virginia Polytechnic Institute and State University) · Weiyan Shi (Stanford University) · Ming Jin (Virginia Tech) · Zhou Yu (Columbia University) · Ruoxi Jia (Virginia Tech)
Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples
Junhao Dong (Nanyang Technological University) · Piotr Koniusz (Australian National University) · Junxi Chen (SUN YAT-SEN UNIVERSITY) · Z. Wang (University of British Columbia) · Yew-Soon Ong (Nanyang Technological University)
Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features
Youngmin Chung (Sung Kyun Kwan University) · Ji Hun Ha (Sung Kyun Kwan University) · Kyeong Chan Im (Sungkyunkwan University) · Joo Sang Lee (Sungkyunkwan University)
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Fanghua Yu (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Jinjin Gu (University of Sydney) · Zheyuan Li (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Jinfan Hu (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Xiangtao Kong (Hong Kong Polytechnic University) · Xintao Wang (Tencent) · Jingwen He (Shanghai ai lab) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Chao Dong (SIAT)
Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning
Christopher Liao (None) · Theodoros Tsiligkaridis (MIT Lincoln Laboratory, Massachusetts Institute of Technology) · Brian Kulis (Boston University)
Referring Image Editing: Object-level Image Editing via Referring Expressions
Chang Liu (None) · Xiangtai Li (Nanyang Technological University) · Henghui Ding (Fudan University)
BiPer: Binary Neural Networks using a Periodic Function
Edwin Vargas (Universidad Industrial de Santander) · Claudia Correa (Universidad Industrial de Santander) · Carlos Hinojosa (KAUST) · Henry Arguello (Universidad Industrial de Santander)
Streaming Dense Video Captioning
Xingyi Zhou (Google) · Anurag Arnab (Google) · Shyamal Buch (Google) · Shen Yan (Google Research) · Austin Myers (Google) · Xuehan Xiong (Google) · Arsha Nagrani (Google ) · Cordelia Schmid (Inria / Google)
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min (POSTECH) · Shyamal Buch (Google) · Arsha Nagrani (Google ) · Minsu Cho (POSTECH) · Cordelia Schmid (Inria / Google)
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
Inhwan Bae (GIST) · Junoh Lee (Gwangju Institute of Science and Technology) · Hae-Gon Jeon (GIST)
SNI-SLAM: Semantic Neural Implicit SLAM
Siting Zhu (Shanghai Jiao Tong University) · Guangming Wang (University of Cambridge) · Hermann Blum (Computer Vision and Geometry Lab, ETH Zürich) · Jiuming Liu (Shanghai Jiao Tong University) · LiangSong (China University of Mining Technology - Xuzhou) · Marc Pollefeys (ETH Zurich / Microsoft) · Hesheng Wang (Shanghai Jiao Tong University)
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Li Hu (Alibaba)
Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Shijie Zhou (University of California, Los Angeles) · Haoran Chang (University of California, Los Angeles) · Sicheng Jiang (University of California, Los Angeles) · Zhiwen Fan (University of Texas, Austin) · Zehao Zhu (University of Texas at Austin) · Dejia Xu (University of Texas at Austin) · Pradyumna Chari (University of California, Los Angeles) · Suya You (University of Southern California) · Zhangyang Wang (University of Texas at Austin) · Achuta Kadambi (UCLA)
On Train-Test Class Overlap and Detection for Image Retrieval
Chull Hwan Song (Dealicious Inc) · Jooyoung Yoon (Dealicious Inc) · Taebaek Hwang (None) · Shunghyun Choi (Dealicious Inc.) · Yeong Hyeon Gu (Sejong University) · Yannis Avrithis (IARAI)
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Youngdong Jang (Korea University) · Dong In Lee (Korea University) · MinHyuk Jang (Korea University) · Jong Wook Kim (Korea University) · Feng Yang (Google Research) · Sangpil Kim (Korea University)
PolarRec: Improving Radio Interferometric Data Reconstruction Using Polar Coordinates
Ruoqi Wang (The Hong Kong University of Science and Technology (Guangzhou)) · Zhuoyang Chen (The Hong Kong University of Science and Technology (Guangzhou)) · Jiayi Zhu (Hong Kong University of Science and Technology (Guangzhou)) · Qiong Luo (Hong Kong University of Science and Technology) · Feng Wang (Guangzhou University)
Disentangled Pre-training for Human-Object Interaction Detection
Zhuolong Li (South China University of Technology) · Xingao Li (South China University of Technology) · Changxing Ding (South China University of Technology) · Xiangmin Xu (South China University of Technology)
Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation
Fahimeh Hosseini Noohdani (Sharif University of Technology) · Parsa Hosseini (Sharif University of Technology) · Aryan Yazdan Parast (Sharif University of Technology) · Hamidreza Araghi (Sharif University of Technology) · Mahdieh Baghshah (Sharif University of Technology)
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni (The Pennsylvania State University) · Bernhard Egger (Massachusetts Institute of Technology) · Suhas Lohit (Mitsubishi Electric Research Labs) · Anoop Cherian (Mitsubishi Electric Research Labs (MERL)) · Ye Wang (Mitsubishi Electric Research Labs) · Toshiaki Koike-Akino (Mitsubishi Electric Research Labs. (MERL)) · Sharon X. Huang (Pennsylvania State University) · Tim Marks (None)
Can Biases in ImageNet Models Explain Generalization?
Paul Gavrikov (Offenburg University) · Janis Keuper (Institute for Machine Learning and Analytics, Offenburg University)
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Xiangyang Zhu (City University of Hong Kong) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Bowei He (City University of Hong Kong) · Ziyu Guo (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Jiaming Liu (Peking University) · Han Xiao (The Chinese University of Hong Kong & Shanghai AI Laboratory) · Chaoyou Fu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Hao Dong (None) · Peng Gao (The Chinese University of Hong Kong)
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Qilong Zhangli (Rutgers University) · Jindong Jiang (Rutgers University) · Di Liu (Rutgers University, New Brunswick) · Licheng Yu (None) · Xiaoliang Dai (Facebook) · Ankit Ramchandani (Meta Platforms, Inc.) · Guan Pang (Facebook) · Dimitris N. Metaxas (Rutgers) · Praveen Krishnan (Meta AI)
Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning
Jiahan Li (Harbin Institute of Technology) · Jiuyang Dong (Harbin Institute of Technology) · Shenjin Huang (None) · Xi Li (Department of Gastroenterology, Shenzhen Hospital, Peking University) · Junjun Jiang (Harbin Institute of Technology) · Xiaopeng Fan (Harbin Institute of Technology) · Yongbing Zhang (Harbin Institute of Technology)
CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution
Qingguo Liu (Nanjing University of Aeronautics and Astronautics) · Chenyi Zhuang (Nanjing University of Aeronautics and Astronautics) · Pan Gao (Nanjing University of Aeronautics and Astronautics, Tsinghua University) · Jie Qin (Nanjing University of Aeronautics and Astronautics)
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
Kumaranage Ravindu Nagasinghe (Mohamed bin Zayed University of Artificial Intelligence) · Honglu Zhou (Rutgers University) · Malitha Gunawardhana (University of Auckland) · Martin Renqiang Min (NEC Laboratories America) · Daniel Harari (Weizmann Institute of Science) · Muhammad Haris Khan (None)
iToF-flow-based High Frame Rate Depth Imaging
Yu Meng (Nanjing University) · Zhou Xue (Li Auto) · Xu Chang (Bytedance Inc) · Xuemei Hu (Nanjing University) · Tao Yue (Nanjing University)
Continual Motion Prediction Learning Framework via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy
Dae Jun Kang (None) · Dongsuk Kum (Korea Advanced Institute of Science and Technology) · Sanmin Kim (KAIST)
Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields
Leili Goli (University of Toronto) · Cody Reading (Simon Fraser University) · Silvia Sellán (University of Toronto) · Alec Jacobson (University of Toronto and Adobe Systems) · Andrea Tagliasacchi (Simon Fraser University, Google Brain)
HIT: Estimating Internal Human Implicit Tissues from the Body Surface
Marilyn Keller (Max Planck Institute for Inteligent Systems) · Vaibhav ARORA (INRIA) · Abdelmouttaleb Dakri (None) · Shivam Chandhok (INRIA) · Jürgen Machann (Institute for Diabetes Research and Metabolic Diseases, Helmholtz Center Munich at the University of Tuebingen) · Andreas Fritsche (Eberhard-Karls-Universität Tübingen) · Michael J. Black (University of Tübingen) · Sergi Pujades (INRIA)
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
Jin Yang (Xi'an jiao tong university) · Ping Wei (None) · Huan Li (Xi'an Jiaotong University) · Ziyang Ren (Xi'an Jiaotong University)
Fitting Flats to Flats
Gabriel Dogadov (Technische Universität Berlin) · Ugo Finnendahl (Technische Universität Berlin) · Marc Alexa (TU Berlin)
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Hao Li (The Chinese University of Hong Kong) · Xue Yang (Shanghai AI Laboratory) · Zhaokai Wang (Shanghai Jiao Tong University) · Xizhou Zhu (Shanghai AI Laboratory) · Jie Zhou (None) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Xiaogang Wang (The Chinese University of Hong Kong) · Hongsheng Li (The Chinese University of Hong Kong) · Lewei Lu (SenseTime) · Jifeng Dai (Tsinghua University, Tsinghua University)
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Minghao Chen (University of Oxford) · Junyu Xie (University of Oxford) · Iro Laina (University of Oxford) · Andrea Vedaldi (University of Oxford)
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Rui Li (None) · Tobias Fischer (ETH Zurich) · Mattia Segu (ETH Zurich - Swiss Federal Institute of Technology) · Marc Pollefeys (ETH Zurich / Microsoft) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Federico Tombari (Google, TUM)
Robust Self-calibration of Focal Lengths from the Fundamental Matrix
Viktor Kocur (Comenius University in Bratislava) · Daniel Kyselica (Comenius University in Bratislava) · Zuzana Kukelova (Czech Technical University in Prague)
Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo
Zongrui Li (Nanyang Technological University) · Zhan Lu (Nanyang Technological University) · Haojie Yan (Zhejiang University) · Boxin Shi (Peking University) · Gang Pan (Zhejiang University) · Qian Zheng (Zhejiang University) · Xudong Jiang (Nanyang Technological University)
GenTron: Diffusion Transformers for Image and Video Generation
Shoufa Chen (The University of Hong Kong) · Mengmeng Xu (Meta AI) · Jiawei Ren (Nanyang Technological University) · Yuren Cong (Institute of Information Processing, Leibniz University Hanover) · Sen He (Meta AI) · Yanping Xie (Meta) · Animesh Sinha (Meta AI) · Ping Luo (The University of Hong Kong) · Tao Xiang (University of Surrey) · Juan-Manuel Pérez-Rúa (Meta AI)
Task-Customized Mixture of Adapters for General Image Fusion
Pengfei Zhu (Tianjin University) · Yang Sun (Tianjin University) · Bing Cao (Tianjin University) · Qinghua Hu (Tianjin University)
Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields
Joshua Ahn (University of Chicago) · Haochen Wang (Toyota Technological Institute at Chicago) · Raymond A. Yeh (Purdue University) · Greg Shakhnarovich (Toyota Technological Institute at Chicago)
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation
Sumanth Udupa (Indian Institute of Science) · Prajwal Gurunath (Indian Institute of Science) · Aniruddh Sikdar (Indian Institute of Science) · Suresh Sundaram (Indian Institute of Science, Indian institute of science, Bangalore)
Real-Time Simulated Avatar from Head-Mounted Sensors
Zhengyi Luo (Carnegie Mellon University) · Jinkun Cao (Carnegie Mellon University) · Rawal Khirodkar (Meta) · Alexander Winkler (Meta) · Jing Huang (Facebook) · Kris Kitani (Carnegie Mellon University) · Weipeng Xu (Meta Reality Labs Research)
Multi-Space Alignments Towards Universal LiDAR Segmentation
Youquan Liu (Hochschule Bremerhaven) · Lingdong Kong (National University of Singapore) · Xiaoyang Wu (The University of Hong Kong) · Runnan Chen (None) · Xin Li (East China Normal University) · Liang Pan (Shanghai AI Lab) · Ziwei Liu (Nanyang Technological University) · Yuexin Ma (ShanghaiTech University)
BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection
Wenjie Wang (Zhejiang University) · Yehao Lu (Zhejiang University) · Guangcong Zheng (None) · Shuigenzhan (Zhejiang University) · Xiaoqing Ye (Baidu Inc.) · Zichang Tan (Baidu) · Jingdong Wang (Baidu) · Gaoang Wang (Zhejiang University) · Xi Li (Zhejiang University)
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
Linfang Zheng (University of Birmingham) · Tze Ho Elden Tse (University of Birmingham) · Chen Wang (Department of computer science, the University of Hong Kong) · Yinghan Sun (Southern University of Science and Technology) · Hua Chen (Southern University of Science and Technology) · Aleš Leonardis (University of Birmingham) · Wei Zhang (Southern University of Science and Technology of China) · Hyung Jin Chang ()
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong (National University of Singapore) · Youquan Liu (Hochschule Bremerhaven) · Lai Xing Ng (Institute for Infocomm Research (I2R), A*STAR) · Benoit Cottereau (CNRS) · Wei Tsang Ooi (National University of Singapore)
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang (University of Hong Kong) · Bo Wu (MIT-IBM Watson AI Lab) · Sunli Chen (Tsinghua University) · Zhenfang Chen (MIT-IBM Watson AI lab) · Haotian Guan (The University of Hong Kong) · Wei-Ning Lee (University of Hong Kong) · Li Erran Li (AWS AI, Amazon) · Chuang Gan (MIT-IBM Watson AI Lab)
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling
Xinhang Liu (HKUST) · Yu-Wing Tai (None) · Chi-Keung Tang (The Hong Kong University of Science and Technology) · Pedro Miraldo (None) · Suhas Lohit (Mitsubishi Electric Research Labs) · Moitreya Chatterjee (Mitsubishi Electric Research Labs)
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
Chen Min (Peking University) · Dawei Zhao (Defense Innovation Institute) · Liang Xiao (Defense Innovation Institute) · Jian Zhao () · Xinli Xu (Hong Kong University of Science and Technology) · Zheng Zhu (Tsinghua University) · Lei Jin (Beijing University of Posts and Telecommunications) · Jianshu Li (Ant Group) · Yulan Guo (SUN YAT-SEN UNIVERSITY) · Junliang Xing (Tsinghua University) · Liping Jing (Beijing Jiaotong University) · Yiming Nie (National University of Defense Technology) · Bin Dai (National University of Defense Technology)
OpenStreetView-5M: The Many Roads to Global Visual Geolocation
Guillaume Astruc (ENPC/IGN/CNES) · Nicolas Dufour (Ecole Nationale des Ponts et Chausees) · Ioannis Siglidis (Ecole Nationale des Ponts et Chausees) · Constantin Aronssohn (ENPC, Ecole Nationale des Ponts et Chausées) · Nacim Bouia (Ecole Normale Superieure) · Stephanie Fu (University of California, Berkeley) · Romain Loiseau (IMAGINE - LIGM - ENPC, LASTIG - IGN) · Van Nguyen Nguyen (Ecole des Ponts ParisTech) · Charles Raude (ENPC, Ecole Nationale des Ponts et Chausees) · Elliot Vincent (Imagine (LIGM) - Willow (Inria)) · Lintao XU (Université Gustave Eiffel) · Hongyu Zhou (Ecole Nationale des Ponts et Chausees) · Loic Landrieu (ENPC, IGN)
Depth Prompting for Sensor-Agnostic Depth Estimation
Jin-Hwi Park (GIST) · Chanhwi Jeong (Gwangju Institute of Science and Technology) · Junoh Lee (Gwangju Institute of Science and Technology) · Hae-Gon Jeon (GIST)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang (National University of Singapore) · Mingli Zhu (The Chinese University of Hong Kong(Shen Zhen)) · Aishan Liu () · Baoyuan Wu (The Chinese University of Hong Kong, Shenzhen) · Xiaochun Cao (SUN YAT-SEN UNIVERSITY) · Ee-Chien Chang (National University of Singapore)
SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion
Hsuan-I Ho (ETHZ - ETH Zurich) · Jie Song (ETHZ - ETH Zurich) · Otmar Hilliges (None)
FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures
Lisa Mais (Max Delbrück Center for Molecular Medicine) · Peter Hirsch (Max Delbrück Center for Molecular Medicine) · Claire Managan (HHMI Janelia Research Campus) · Ramya Kandarpa (Environmental Resources Management (ERM)) · Josef Rumberger (Max Delbrück Center for Molecular Medicine) · Annika Reinke (German Cancer Research Center) · Lena Maier-Hein (German Cancer Research Center (DKFZ)) · Gudrun Ihrke (HHMI Janelia Research Campus) · Dagmar Kainmueller (Universität Potsdam)
Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
Bhargav Ghanekar (Rice University) · Salman Siddique Khan (Rice University) · Pranav Sharma (IIT Madras, India) · Shreyas Singh (Indian Institute of Technology, Madras) · Vivek Boominathan (Rice University) · Kaushik Mitra (Indian Institute of Technology, Madras, Dhirubhai Ambani Institute Of Information and Communication Technology) · Ashok Veeraraghavan (William Marsh Rice University)
HOIST-Former: Hand-held Objects Identification, Segmentation, and Tracking in the Wild
Supreeth Narasimhaswamy (Stony Brook University, New York) · Huy Anh Nguyen (Stony Brook University) · Lihan Huang (University of Science and Technology of China) · Minh Hoai (State University of New York, Stony Brook)
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil (Ohio State University, Columbus) · Chan Hee Song (The Ohio State University) · Boyuan Zheng (Ohio State University, Columbus) · Xiang Deng (Google) · Yu Su (Ohio State University) · Wei-Lun Chao (Ohio State University)
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
Xin Guo (Ant Group) · Jiangwei Lao (Ant Group) · Bo Dang (Wuhan University) · Yingying Zhang (Hikvision Research Institute) · Lei Yu (antgroup) · Lixiang Ru (Ant Group) · Liheng Zhong (Ant Group) · Ziyuan Huang (National University of Singapore) · Kang Wu (Wuhan University) · Dingxiang Hu (mybank) · HUIMEI HE (Ant Group) · Jian Wang (, Institute of automation, Chinese academy of science) · Jingdong Chen (Ant Group) · Ming Yang (Ant Group) · Yongjun Zhang (None) · Yansheng Li (Wuhan University)
Continuous Pose for Monocular Cameras in Neural Implicit Representation
Qi Ma (ETH Zurich, INSAIT Sofia) · Danda Paudel (INSAIT, Sofia University) · Ajad Chhatkuli (Swiss Federal Institute of Technology) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.)
Detector-Free Structure from Motion
Xingyi He (Zhejiang University) · Jiaming Sun (Image Derivative Inc.) · Yifan Wang (Zhejiang University) · Sida Peng (None) · Qixing Huang (University of Texas at Austin) · Hujun Bao (Zhejiang University) · Xiaowei Zhou (None)
Learned Lossless Image Compression based on Bit Plane Slicing
Zhe Zhang (Wuhan University) · Huairui Wang (Wuhan University) · Zhenzhong Chen (Wuhan University) · Shan Liu (Tencent Media Lab)
3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images
Yifang Men (Alibaba Group) · Hanxi Liu (Peking University) · Yuan Yao (Alibaba group) · Miaomiao Cui (Alibaba Group) · Xuansong Xie (Alibaba Group) · Zhouhui Lian (Peking University)
Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding
Zhiheng Cheng (East China Normal University) · Qingyue Wei (Stanford University) · Hongru Zhu (None) · Yan Wang (East China Normal University) · Liangqiong Qu (The University of Hong Kong) · Wei Shao (University of Florida) · Yuyin Zhou (UC Santa Cruz)
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie (shanghai university) · Shengye Yu (Shanghai University) · Qile He (Shanghai University) · Mengtian Li (Shanghai University)
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Hao Wu (University of Science and Technology of China) · Huabin Liu (Shanghai Jiao Tong University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Xiao Sun (Shanghai Artificial Intelligence Laboratory)
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
Zike Wu (Nanyang Technological University) · Pan Zhou (Sea Group) · YI Xuanyu (National Technological University) · Xiaoding Yuan (Johns Hopkins University) · Hanwang Zhang (Nanyang Technological University)
Diffusion Time-step Curriculum for One Image to 3D Generation
YI Xuanyu (National Technological University) · Zike Wu (Nanyang Technological University) · Qingshan Xu (Nanyang Technological University) · Pan Zhou (Sea Group) · Joo Lim (I2R, A*STAR) · Hanwang Zhang (Nanyang Technological University)
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu (Tsinghua University) · Quan Sun (BAAI) · Xiaosong Zhang (Beijing Academy of Artificial Intelligence) · Yufeng Cui (Beihang University) · Fan Zhang (Beijing Academy of Artificial Intelligence) · Yue Cao (Beijing Academy of Artificial Intelligence) · Xinlong Wang (Beijing Academy of Artificial Intelligence) · Jingjing Liu (Tsinghua University, Tsinghua University)
Generative Multimodal Models are In-Context Learners
Quan Sun (BAAI) · Yufeng Cui (Beihang University) · Xiaosong Zhang (Beijing Academy of Artificial Intelligence) · Fan Zhang (Beijing Academy of Artificial Intelligence) · Qiying Yu (Tsinghua University) · Yueze Wang (Beijing Academy of Artificial Intelligence) · Yongming Rao (Tsinghua University) · Jingjing Liu (Tsinghua University, Tsinghua University) · Tiejun Huang (Peking University) · Xinlong Wang (Beijing Academy of Artificial Intelligence)
LoS: Local Structure Guided Stereo Matching
Kunhong Li (SUN YAT-SEN UNIVERSITY) · Longguang Wang (National University of Defense Technology) · Ye Zhang (SUN YAT-SEN UNIVERSITY) · Kaiwen Xue (Huawei Cloud Computing Technologies Co., Ltd.) · Shunbo Zhou (Huawei Technologies Ltd.) · Yulan Guo (SUN YAT-SEN UNIVERSITY)
Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes
Liqiong Wang (China Three Gorges University) · Jinyu Yang (University of Birmingham) · Yanfu Zhang (College of William and Mary) · Fangyi Wang (China Three Gorges University) · Feng Zheng (Southern University of Science and Technology)
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
Yun Liu (Tsinghua University) · Haolin Yang (Beijing University of Posts and Telecommunications) · Xu Si (Tsinghua University) · Ling Liu (Beijing Institute of Technology) · Zipeng Li (Tsinghua University, Tsinghua University) · Yuxiang Zhang (Tsinghua University, Tsinghua University) · Yebin Liu (Tsinghua University) · Li Yi ()
HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
Mengcheng Li (Tsinghua University, Tsinghua University) · Hongwen Zhang (Beijing Normal University) · Yuxiang Zhang (Tsinghua University, Tsinghua University) · Ruizhi Shao (Tsinghua University, Tsinghua University) · Tao Yu (Tsinghua University, Tsinghua University) · Yebin Liu (Tsinghua University)
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Liangxiao Hu (Harbin Institute of Technology) · Hongwen Zhang (Beijing Normal University) · Yuxiang Zhang (Tsinghua University, Tsinghua University) · Boyao ZHOU (Tsinghua University) · Boning Liu (Department of Automation, Tsinghua University) · Shengping Zhang (Harbin Institute of Technology) · Liqiang Nie (Harbin Institute of Technology (Shenzhen))
ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning
Yuxiang Zhang (Tsinghua University, Tsinghua University) · Hongwen Zhang (Beijing Normal University) · Liangxiao Hu (Harbin Institute of Technology) · Jiajun Zhang (Beijing University of Posts and Telecommunications) · Hongwei Yi (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Shengping Zhang (Harbin Institute of Technology) · Yebin Liu (Tsinghua University)
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
Qingping SUN (City University of Hong Kong) · Yanjun Wang (Shanghai Jiao Tong University) · Ailing Zeng (IDEA) · Wanqi Yin (SenseTime Research ) · Chen Wei (SenseTime International PTE. LTD.) · Wenjia Wang (University of Hong Kong) · Haiy Mei (None) · Chi LEUNG (City University of Hong Kong) · Ziwei Liu (Nanyang Technological University) · Lei Yang (The Chinese University of Hong Kong) · Zhongang Cai (Nanyang Technological University)
Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names
Yapeng Li (Wuhan University) · Yong Luo (Wuhan University) · Zengmao Wang (Wuhan University) · Bo Du (Wuhan University)
Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Weiwei Cao (University of Science and Technology of China) · Jianpeng Zhang (None) · Yingda Xia (Alibaba Group) · Tony C. W. MOK (Alibaba DAMO Academy) · Zi Li (Alibaba DAMO Academy) · Xianghua Ye (Zhejiang University) · Le Lu (Alibaba Group) · Jian Zheng (Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences) · Yuxing Tang (Alibaba Group) · Ling Zhang (Alibaba Group)
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. MOK (Alibaba DAMO Academy) · Zi Li (Alibaba DAMO Academy) · Yunhao Bai () · Jianpeng Zhang (None) · Wei Liu (Alibaba Group) · Yan-Jie Zhou (DAMO Academy, Alibaba Group) · Ke Yan (Alibaba DAMO Academy) · Dakai Jin (Alibaba Group) · Yu Shi (China Medical University Shenyang) · Xiaoli Yin (China Medical University Shenyang) · Le Lu (Alibaba Group) · Ling Zhang (Alibaba Group)
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang (None) · Liangbin Xie (Macau) · Xintao Wang (Tencent) · Ziyang Yuan (Tsinghua University, Tsinghua University) · Xiaodong Cun (Tencent AI Lab) · Yixiao Ge (Tencent) · Jiantao Zhou (University of Macau) · Chao Dong (SIAT) · Rui Huang (The Chinese University of Hong Kong, Shenzhen) · Ruimao Zhang (The Chinese University of Hong Kong (Shenzhen)) · Ying Shan (Tencent)
LASO: Language-guided Affordance Segmentation on 3D Object
Yicong Li (national university of singaore, National University of Singapore) · Na Zhao (Singapore University of Technology and Design) · Junbin Xiao (None) · Chun Feng (University of Science and Technology of China) · Xiang Wang (University of Science and Technology of China) · Tat-seng Chua (National University of Singapore)
Control4D: Efficient 4D Portrait Editing with Text
Ruizhi Shao (Tsinghua University, Tsinghua University) · Jingxiang Sun (None) · Cheng Peng (Tsinghua University, Tsinghua University) · Zerong Zheng (Tsinghua University) · Boyao ZHOU (Tsinghua University) · Hongwen Zhang (Beijing Normal University) · Yebin Liu (Tsinghua University)
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Xin Huang (Northwest Polytechnical University Xi'an) · Ruizhi Shao (Tsinghua University, Tsinghua University) · Qi Zhang (Northwest Polytechnical University Xi'an) · Hongwen Zhang (Beijing Normal University) · Ying Feng (Northwest Polytechnical University Xi'an) · Yebin Liu (Tsinghua University) · Qing Wang (Northwestern Polytechnical University)
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Yuelang Xu (Tsinghua University, Tsinghua University) · Benwang Chen (Tsinghua University, Tsinghua University) · Zhe Li (Tsinghua University) · Hongwen Zhang (Beijing Normal University) · Lizhen Wang (Tsinghua University, Tsinghua University) · Zerong Zheng (Tsinghua University) · Yebin Liu (Tsinghua University)
ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection
Yichen Bai (None) · Zongbo Han (Tianjin University) · Bing Cao (Tianjin University) · Xiaoheng Jiang (Zhengzhou University) · Qinghua Hu (Tianjin University) · Changqing Zhang (Tianjin University)
Diffusion Models Without Attention
Jing Nathan Yan (Cornell University) · Jiatao Gu (Apple (MLR)) · Alexander Rush (Cornell Tech)
HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
Yi ZHOU (Samsung Research China-Beijing(SRCB)) · Hui Zhang (Samsung Rearch China-Beijing(SRCB)) · Jiaqian Yu (Samsung R&D Institute China - Beijing) · yifan yang (Samsung) · Sangil Jung (samsung) · Seung-In Park (Samsung Advanced Institute of Technology) · ByungIn Yoo (Samsung Advanced Institute of Technology)
Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences
Minyoung Hwang (Seoul National University) · Luca Weihs (Allen Institute for Artificial Intelligence) · Chanwoo Park (Massachusetts Institute of Technology) · Kimin Lee (KAIST) · Aniruddha Kembhavi (Allen Institute for Artificial Intelligence) · Kiana Ehsani (Allen Institute for Artificial Intelligence)
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
Romain Loiseau (IMAGINE - LIGM - ENPC, LASTIG - IGN) · Elliot Vincent (Imagine (LIGM) - Willow (Inria)) · Mathieu Aubry (ENPC) · Loic Landrieu (ENPC, IGN)
Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation
嘉鹏 苏 (Harbin Institute of Technology) · Qi Fan (The Hong Kong University of Science and Technology) · Wenjie Pei (Harbin Institute of Technology) · Guangming Lu (Harbin Institute of Technology, Shenzhen) · Fanglin Chen (Harbin Institute of Technology (Shenzhen))
Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori (University of Trento) · Alessandro Conti (University of Trento) · Paolo Rota (University of Trento) · Yiming Wang (Fondazione Bruno Kessler) · Elisa Ricci (University of Trento)
GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection
Xiaotian Li (Nanjing University of Posts and Telecommunications) · Baojie Fan (Nanjing University of Posts and Telecommunications) · Jiandong Tian (The Shenyang Institute of Automation, Chinese Academy of Sciences) · Huijie Fan (None)
Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning
Dipam Goswami (Computer Vision Center) · Albin Soutif (Computer Vision Center, Universitat Autònoma de Barcelona) · Yuyang Liu (Shenyang Institute of Automation, Chinese Academy of Sciences/ University of Chinese Academy of Sciences) · Sandesh Kamath (Computer Vision Center, Universitat Autónoma de Barcelona) · Bartłomiej Twardowski (Computer Vision Center / IDEAS NCBR) · Joost van de Weijer (Computer Vision Center Barcelona)
FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo (Harvard Ophthalmology AI Lab) · MIN SHI (Harvard University) · Muhammad Osama Khan (New York University) · Muhammad Muneeb Afzal (New York University) · Hao Huang (New York University) · Shuaihang Yuan (New York University) · Yu Tian (None) · Luo Song (Mass Eye and Ear) · Ava Kouhana (Harvard Ophthalmology AI lab) · Tobias Elze (Harvard University) · Yi Fang (New York University) · Mengyu Wang (Harvard University)
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang (UIUC) · Chejian Xu (University of Illinois at Urbana-Champaign) · Bo Li (UIUC)
Unsupervised Universal Image Segmentation
Xudong Wang (Electrical Engineering & Computer Science Department, University of California Berkeley) · Dantong Niu (University of California, Berkeley) · Xinyang Han (UC Berkeley) · Long Lian (University of California, Berkeley) · Roei Herzig (Tel Aviv University) · Trevor Darrell (Electrical Engineering & Computer Science Department)
Finsler-Laplace-Beltrami Operators with Application to Shape Analysis
Simon Weber (Technische Universität München) · Thomas Dagès (Technion - Israel Institute of Technology) · Maolin Gao (None) · Daniel Cremers (Technical University Munich)
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
Karren Yang (Apple) · Anurag Ranjan (Apple) · Jen-Hao Rick Chang (Apple) · Raviteja Vemulapalli (None) · Oncel Tuzel (Apple)
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve (Amazon) · Fan Fei (Amazon) · Jayakrishnan Unnikrishnan (Amazon) · Son Dinh Tran (Amazon) · Benjamin Yao (Amazon) · Belinda Zeng (Amazon) · Mubarak Shah (University of Central Florida) · Trishul Chilimbi (Department of Computer Science, University of Wisconsin - Madison)
4SAVED - Four Seasons Autonomous Vehicle Environment Dataset
Daniel Kent (Michigan State University) · Mohammed Alyaqoub (Michigan State University) · Xiaohu Lu (Michigan State University) · Sayed Khatounabadi (Michigan State University) · Kookjin Sung (Michigan State University) · Cole Scheller (Michigan State University) · Alexander Dalat (University of Michigan - Ann Arbor) · Xinwei Guo (Michigan State University) · Asma Bin Thabit (Michigan State University) · Roberto Muntaner Whitley (Michigan State University) · Hayder Radha (Michigan State University)
360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Huajian Huang (The Hong Kong University of Science and Technology) · Changkun Liu (Hong Kong University of Science and Technology) · Yipeng Zhu (Hong Kong University of Science and Technology) · Hui Cheng (SUN YAT-SEN UNIVERSITY) · Tristan Braud (Hong Kong University of Science and Technology) · Sai-Kit Yeung (The Hong Kong University of Science and Technology (HKUST))
MeshPose: Unifying DensePose and 3D Body Mesh reconstruction
Eric-Tuan Le (University College London) · Antonios Kakolyris (Snap Inc.) · Petros Koutras (Snap Inc.) · Himmy Tam (Snap Inc.) · Efstratios Skordos (Snap Inc.) · George Papandreou (Snap Inc.) · Riza Alp Guler (Snap Inc.) · Iasonas Kokkinos (Snap Inc.)
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Petru-Daniel Tudosiu (Huawei) · Yongxin Yang (Queen Mary University of London) · Shifeng Zhang (Huawei Technologies Ltd.) · Fei Chen (Huawei Noah's Ark Lab) · Steven McDonagh (University of Edinburgh) · Gerasimos Lampouras (Huawei Technologies Ltd.) · Ignacio Iacobacci (Huawei Noah's Ark Lab) · Sarah Parisot (Huawei)
LightIt: Illumination Modeling and Control for Diffusion Models
Peter Kocsis (None) · Kalyan Sunkavalli (Adobe Research) · Julien Philip (Adobe Systems) · Matthias Nießner (Technical University of Munich) · Yannick Hold-Geoffroy (Adobe Research)
RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction
Baptiste Brument (IRIT, University of Toulouse, France) · Robin Bruneau (University of Copenhagen) · Yvain Queau (CNRS) · Jean Mélou (IRIT) · Francois Lauze (Department fo Computer Science, University of Copenhagen) · Jean-Denis Durou (IRIT) · Lilian Calvet (OR-X, Balgrist Hospital, University of Zurich)
CaDeT: a Causal Disentanglement Approach for Robust Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz (None) · Junrui Zhang (University of Toronto) · Amir Rasouli (Huawei Technologies Canada)
Text-Guided 3D Face Synthesis - From Generation to Editing
Yunjie Wu (NetEase, Inc.) · Yapeng Meng (Tsinghua University, Tsinghua University) · Zhipeng Hu (Leihuo Game, NetEase) · Lincheng Li () · Haoqian Wu (NetEase Fuxi AI Lab) · Kun Zhou (Zhejiang University) · Weiwei Xu (Zhejiang University) · Xin Yu (University of Queensland)
EgoGen: An Egocentric Synthetic Data Generator
Gen Li (ETH Zurich) · Kaifeng Zhao (ETHZ - ETH Zurich) · Siwei Zhang (ETH Zurich) · Xiaozhong Lyu (Department of Computer Science, ETHZ - ETH Zurich) · Mihai Dusmanu (Microsoft) · Yan Zhang (ETH Zurich) · Marc Pollefeys (ETH Zurich / Microsoft) · Siyu Tang (ETH Zurich)
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment
Muhammad Sohail Danish (Mohamed bin Zayed University of Artificial Intelligence) · Muhammad Haris Khan (None) · Muhammad Akhtar Munir (None) · M. Sarfraz (Karlsruher Institut für Technologie) · Mohsen Ali (Information Technology University)
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
Hao Li (Northwest Polytechnical University) · Dingwen Zhang (Northwestern Polytechnical University) · Yalun Dai (Nanyang Technological University) · Nian Liu (Mohamed bin Zayed University of Artificial Intelligence) · Lechao Cheng (Hefei University of Technology) · Li Jingfeng (Northwest Polytechnical University Xi'an) · Jingdong Wang (Baidu) · Junwei Han (Northwestern Polytechnical University, Tsinghua University)
Validating Privacy-Preserving Face Recognition under a Minimum Assumption
Hui Zhang (Anhui University) · Xingbo Dong (Anhui University) · YenLungLai (Anhui University) · Ying Zhou (Anhui University) · Xiaoyan ZHANG (Anhui University) · Xingguo Lv (Anhui University) · Zhe Jin (Anhui University) · Xuejun Li (Anhui University)
3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis
Zhicheng Lu (Northwest Polytechnical University Xi'an) · xiang guo (Northwest Polytechnical University Xi'an) · Le Hui (Nanjing University Of Science And Technology) · Tianrui Chen (Northwest Polytechnical University Xi'an) · Min Yang (None) · Xiao Tang (None) · feng zhu (None) · Yuchao Dai (Northwestern Polytechnical University)
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
Yichen Yao (ShanghaiTech University) · Zimo Jiang (ShanghaiTech University) · YUJING SUN (the University of Hong Kong, University of Hong Kong) · Zhencai Zhu (Innovation Academy for Microsatellites) · Xinge Zhu (The Chinese University of Hong Kong) · Runnan Chen (None) · Yuexin Ma (ShanghaiTech University)
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
Yujun Shi (national university of singaore, National University of Singapore) · Chuhui Xue (ByteDance Inc.) · Jun Hao Liew (ByteDance) · Jiachun Pan (National University of Singapore) · Hanshu Yan (ByteDance) · Wenqing Zhang (Huazhong University of Science and Technology) · Vincent Y. F. Tan (National University of Singapore) · Song Bai (ByteDance)
Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners
Junhao Dong (Nanyang Technological University) · Piotr Koniusz (Australian National University) · Junxi Chen (SUN YAT-SEN UNIVERSITY) · Xiaohua Xie (SUN YAT-SEN UNIVERSITY) · Yew-Soon Ong (Nanyang Technological University)
Generative Latent Coding for Ultra-Low Bitrate Image Compression
Zhaoyang Jia (University of Science and Technology of China) · Jiahao Li (Microsoft Research Asia) · Bin Li (Microsoft) · Houqiang Li (University of Science and Technology of China) · Yan Lu (Microsoft Research Asia)
SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation
Yanzhe Liu (None) · Rong Chen (Dalian Maritime University) · Yushi Li (Xi'an Jiaotong-Liverpool University) · Yixi Li (Dalian Martime University) · Xuehou Tan (Tokai University)
Differentiable Point-based Inverse Rendering
Hoon-Gyu Chung (POSTECH) · Seokjun Choi (Pohang University of Science and Technology) · Seung-Hwan Baek (POSTECH)
GS-IR: 3D Gaussian Splatting for Inverse Rendering
Zhihao Liang (South China University of Technology) · Qi Zhang (Tencent AI Lab) · Ying Feng (Tencent AI Lab) · Ying Shan (Tencent) · Kui Jia (South China University of Technology)
Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Hang Du (Beijing University of Posts and Telecommunications) · Sicheng Zhang (Beijing University of Posts and Telecommunications) · Binzhu Xie (Beijing University of Posts and Telecommunications) · Guoshun Nan (Beijing University of Posts and Telecommunications) · Jiayang Zhang (Beijing University of Posts and Telecommunications) · Junrui Xu (Beijing University of Posts and Telecommunications) · Hangyu Liu (Beijing University of Posts and Telecommunications) · Sicong Leng (Nanyang Technological University) · Jiangming Liu (Yunnan University) · Hehe Fan (None) · Dajiu Huang (South China University) · Jing Feng (Beijing University of Posts and Telecommunications) · Linli Chen (Sichuan University) · Can Zhang (Beijing University of Posts and Telecommunications) · Xuhuan Li (Beijing University of Posts and Telecommunications) · Hao Zhang (Beijing University of Posts and Telecommunications) · Jianhang Chen (Beijing University of Posts and Telecommunications) · Qimei Cui (Beijing University of Posts and Telecommunications) · Xiaofeng Tao (Beijing University of Posts and Telecommunications)
Long-Tail Class Incremental Learning via Independent Sub-prototype Construction
Xi Wang (Xidian University) · Xu Yang (Xi'an University of Electronic Science and Technology) · jie yin (None) · Kun Wei (Xidian University) · Cheng Deng (Xidian University)
Make Pixels Dance: High-Dynamic Video Generation
Yan Zeng (ByteDance) · Guoqiang Wei (ByteDance) · Jiani Zheng (None) · Jiaxin Zou (ByteDance Ltd.) · Yang Wei (East China Normal University) · Yuchen Zhang ( ByteDance Research) · Hang Li (ByteDance Technology)
Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation
Hyunwoo Ryu (None) · Jiwoo Kim (Yonsei University) · Hyunseok An (Yonsei University) · Junwoo Chang (Yonsei University) · Joohwan Seo (University of California, Berkeley) · Taehan Kim (Samsung) · Yubin Kim (Massachusetts Institute of Technology) · Chaewon Hwang (Ewha Women's University) · Jongeun Choi (Yonsei University) · Roberto Horowitz (University of California, Berkeley)
Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution
Guangyuan Li (Zhejiang University) · Chen Rao (Zhejiang University) · Juncheng Mo (Zhejiang University) · Zhanjie Zhang (Zhejiang University) · Wei Xing (Zhejiang University) · Lei Zhao (Zhejiang University)
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Dazhong Shen (Shanghai Artificial Intelligence Laboratory) · Guanglu Song (Sensetime X-Lab) · Zeyue Xue (The University of Hong Kong) · Fu-Yun Wang (The Chinese University of Hong Kong) · Yu Liu (The Chinese University of Hong Kong)
Learning Equi-angular Representations for Online Continual Learning
Minhyuk Seo (Yonsei University) · Hyunseo Koh (Gwangju Institute of Science and Technology) · Wonje Jeung (Yonsei University) · Minjae Lee (Yonsei University) · San Kim (Yonsei University) · Hankook Lee (Sungkyunkwan University) · Sungjun Cho (LG AI Research) · Sungik Choi (LG AI Research) · Hyunwoo Kim (Zhejiang Lab) · Jonghyun Choi (Seoul National University)
Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
Rui Song (Technical University of Munich) · Chenwei Liang (Fraunhofer) · Hu Cao (Technical University of Munich) · Zhiran Yan (Technische Hochschule Ingolstadt) · Walter Zimmer (Technical University of Munich (TUM)) · Markus Gross (Fraunhofer IVI) · Andreas Festag (Technische Hochschule Ingolstadt) · Alois Knoll (Technical University Munich)
CosmicMan: A Text-to-Image Foundation Model for Humans
Shikai Li (Shanghai AI Lab) · Jianglin Fu (None) · Kaiyuan Liu (None) · Wentao Wang (Shanghai AI Laboratory) · Kwan-Yee Lin (The Chinese University of Hong Kong) · Wayne Wu (None)
Improving Bird’s Eye View Semantic Segmentation by Task Decomposition
Tianhao Zhao (Wuhan University) · Yongcan Chen (Wuhan University) · Yu Wu (Wuhan University) · Tianyang Liu (Wuhan University) · Bo Du (Wuhan University) · Peilun Xiao (Didi Research) · shi qiu (None) · Hongda Yang (Beijing DiDi Infinity Technology and Development Co., Ltd.) · Guozhen Li (Didi Global) · yi yang (Didi Global) · Yutian Lin (Wuhan University)
Neural Video Compression with Feature Modulation
Jiahao Li (Microsoft Research Asia) · Bin Li (Microsoft) · Yan Lu (Microsoft Research Asia)
GenesisTex: Adapting Image Denoising Diffusion to Texture Space
Chenjian Gao (None) · Boyan Jiang (Fudan University) · Xinghui Li (Tsinghua University, Tsinghua University) · YingPeng Zhang (South China University of Technology) · Qian Yu (Beihang University)
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
Yu Wang (Baidu) · Xin Li (Baidu) · Shengzhao Wen (Baidu) · gang zhang (Baidu Inc.) · Haixiao Yue (Baidu) · Haocheng Feng (Baidu) · Junyu Han (Baidu) · Errui Ding (Baidu Inc.)
Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network
Sizhe Zheng (None) · Pan Gao (Nanjing University of Aeronautics and Astronautics, Tsinghua University) · Peng Zhou (Nanjing University of Aeronautics and Astronautics) · Jie Qin (Nanjing University of Aeronautics and Astronautics)
Efficient Test-Time Adaptation of Vision-Language Models
Adilbek Karmanov (Mohamed bin Zayed University of Artificial Intelligence) · Dayan Guan (Nanyang Technological University) · Shijian Lu (Nanyang Technological University) · Abdulmotaleb El Saddik (Mohamed bin Zayed University of Artificial Intelligence) · Eric P. Xing (Mohamed bin Zayed Univeristy of AI)
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
Xianqi Wang (Huazhong University of Science and Technology) · Gangwei Xu (Huazhong University of Science and Technology) · Hao Jia (Huazhong University of Science and Technology) · Xin Yang (Huazhong University of Science and Technology)
Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz (None) · Mohammad Sabokrou (Okinawa Institute of Science and Technology (OIST)) · Amir Rasouli (Huawei Technologies Canada)
SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream
Lin Zhu (Beijing Institute of Technology) · Kangmin Jia (Beijing Institute of Technology) · Yifan Zhao (Beihang University) · Yunshan Qi (BeiHang University) · Lizhi Wang (None) · Hua Huang (Beijing Normal University)
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Yonglu Li (Shanghai Jiaotong University) · Xiaoqian Wu (None) · Xinpeng Liu (Shanghai Jiao Tong University) · Zehao Wang (Shanghai Jiao Tong University) · Yiming Dou (University of Michigan - Ann Arbor) · Yikun Ji (Shanghai Jiaotong University) · Junyi Zhang (Shanghai Jiao Tong University) · Yixing Li (Shanghai Jiao Tong University) · Xudong LU (The Chinese University of Hong Kong) · Jingru Tan (Central South University) · Cewu Lu (Shanghai Jiao Tong University)
Implicit Motion Function
Yue Gao (Microsoft Research) · Jiahao Li (Microsoft Research Asia) · Lei Chu (Microsoft Research Asia) · Yan Lu (Microsoft Research Asia)
ICP-Flow: LiDAR Scene Flow Estimation with ICP
Yancong Lin (Delft University of Technology) · Zimin Xia (Motional)
Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation
Xin Kang () · Lei Chu (Microsoft Research Asia) · Jiahao Li (Microsoft Research Asia) · Xuejin Chen (University of Science and Technology of China) · Yan Lu (Microsoft Research Asia)
ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Meng-Li Shih (University of Washington) · Wei-Chiu Ma (Cornell University) · Lorenzo Boyice (Google) · Aleksander Holynski (UC Berkeley & Google Research) · Forrester Cole (Google) · Brian Curless (University of Washington) · Janne Kontkanen (Research, Google)
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
Jingyao Xu (Beijing Jiaotong University) · Yuetong Lu (Beijing Jiaotong University) · Yandong Li (Google Research) · Siyang Lu (Beijing Jiaotong University) · Dongdong Wang (University of Central Florida) · Xiang Wei (Beijing Jiaotong university)
Intensity-Robust Autofocus for Spike Camera
Changqing Su (Peking University) · Zhiyuan Ye (Nanchang Hangkong University) · Yongsheng Xiao (Nanchang Hangkong University) · You Zhou (Nanjing University) · Zhen Cheng (Tsinghua University, Tsinghua University) · Bo Xiong (Peking University) · Zhaofei Yu (Peking University) · Tiejun Huang (Peking University)
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Mir Hossain Hossain (None) · Mennatullah Siam (None) · Leonid Sigal (University Of British Columbia) · Jim Little (University of British Columbia, Canada)
S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data
Xuyang Li (None) · Danfeng Hong (Chinese Academy of Sciences, Aerospace Information Research Institute) · Jocelyn Chanussot (INRIA)
Object Pose Estimation via the Aggregation of Diffusion Features
Tianfu Wang (University of Chinese Academy of Sciences) · Guosheng Hu (Oosto) · Hongguang Wang (Shenyang Institute of Automation)
FSC: Few-point Shape Completion
Xianzu Wu (Jianghan University) · Xianfeng Wu (Jianghan University) · Tianyu Luan (State University of New York at Buffalo) · Yajing Bai (Jianghan University) · Zhongyuan Lai (Jianghan University) · Junsong Yuan (State University of New York at Buffalo)
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
Peng Lu (SIGS, Tsinghua University) · Tao Jiang (Shanghai AI Laboratory) · Yining Li (Shanghai AI Laboratory) · Xiangtai Li (Nanyang Technological University) · Kai Chen (Shanghai AI Laboratory) · Wenming Yang (Tsinghua University,)
Resolution Limit of Single-Photon LIDAR
Stanley H. Chan (Purdue University, USA) · Hashan K Weerasooriya (Purdue University) · Weijian Zhang (Purdue University) · Pamela Abshire (University of Maryland, College Park) · Istvan Gyongy (University of Edinburgh, University of Edinburgh) · Robert Henderson (University of Edinburgh)
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
Yutao Feng (Zhejiang University) · Yintong Shang (University of Utah) · Xuan Li (None) · Tianjia Shao (Zhejiang University) · Chenfanfu Jiang (University of California, Los Angeles) · Yin Yang (University of Utah)
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Jiangbo Shi (Xi'an Jiaotong University) · Chen Li (Xi'an Jiaotong University) · Tieliang Gong (Xi'an Jiaotong University) · Yefeng Zheng (None) · Huazhu Fu (Institute of High Performance Computing, Singapore, A*STAR)
MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision
Chenyangguang Zhang (Tsinghua University) · Guanlong Jiao (Tsinghua University, Tsinghua University) · Yan Di (Technische Universität München) · Gu Wang (Tsinghua University) · Ziqin Huang (Tsinghua University, Tsinghua University) · Ruida Zhang (Department of Automation, Tsinghua University, Tsinghua University) · Fabian Manhardt (Google) · Bowen Fu (Technische Universität München) · Federico Tombari (Google, TUM) · Xiangyang Ji (Tsinghua University)
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe (None) · Satya Narayan Shukla (Meta AI) · Omid Poursaeed (Meta AI) · Michael Ryoo (Stony Brook University) · Tsung-Yu Lin (Department of Computer Science, University of Massachusetts, Amherst)
Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person View
Ziwei Zhao (Indiana University) · Yuchen Wang (Indiana University) · Chuhua Wang (Indiana University, Bloomington)
AlignMiF: Geometry-Aligned Multimodal Implicit Field for Enhanced LiDAR-Camera Joint Synthesis
Tao Tang (SYSU) · Guangrun Wang (University of Oxford) · Yixing Lao (None) · Peng Chen (Alibaba Group) · Jie Liu (North China University of Technology) · Liang Lin (SUN YAT-SEN UNIVERSITY, Tsinghua University) · Kaicheng Yu (Alibaba Group) · Xiaodan Liang (Sun Yat-sen University)
GOAT-Bench: A Benchmark for Multi-modal Lifelong Navigation
Mukul Khanna (Georgia Institute of Technology) · Ram Ramrakhya (None) · Gunjan Chhablani (Georgia Institute of Technology) · Sriram Yenamandra (Georgia Institute of Technology) · Theo Gervet (Carnegie Mellon University) · Matthew Chang (University of Illinois, Urbana Champaign) · Zsolt Kira (Georgia Institute of Technology) · Devendra Singh Chaplot (Carnegie Mellon University) · Dhruv Batra (FAIR (Meta) and Georgia Tech) · Roozbeh Mottaghi (Meta)
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem (Brown University) · Conor McCullough (Google) · Randy Hsin (Google) · Chas Leichner (Google) · Shan Li (Google) · In Suk Chong (Google) · Andrew Howard (Google) · Lukasz Lew (Research, Google) · Sherief Reda (Brown University) · Ville-Mikko Rautio (Google) · Daniele Moro (Google Research)
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Minghan Li (The Hong Kong Polytechnic University ) · Shuai Li (The Hong Kong Polytechnic University) · Xindong Zhang (The Hong Kong Polytechnic University, Hong Kong Polytechnic University) · Lei Zhang (The Hong Kong Polytechnic University)
Don’t drop your samples! Coherence-aware training benefits Conditional diffusion
Nicolas Dufour (Ecole Nationale des Ponts et Chausees) · Victor Besnier (Valeo.ai) · Vicky Kalogeiton (Ecole polytechnique, IP Paris) · David Picard (None)
3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
Weijia Li (Sun Yat-sen University) · Haote Yang (PJLab) · Zhenghao Hu (SUN YAT-SEN UNIVERSITY) · Juepeng Zheng (Sun Yat-Sen University) · Gui-Song Xia (Wuhan University) · Conghui He (None)
In Search of a Data Transformation That Accelerates Neural Field Training
Junwon Seo (None) · Sangyoon Lee (POSTECH) · Kwang In Kim (Pohang University of Science and Technology) · Jaeho Lee (POSTECH)
Learning the 3D Fauna of the Web
Zizhang Li (Zhejiang University) · Dor Litvak (University of Texas at Austin) · Ruining Li (University of Oxford) · Yunzhi Zhang (Stanford University) · Tomas Jakab (University of Oxford) · Christian Rupprecht (University of Oxford) · Shangzhe Wu (Stanford University) · Andrea Vedaldi (University of Oxford) · Jiajun Wu (Stanford University)
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang (Princeton University) · Bhishma Dedhia (Princeton University) · Niraj Jha (Princeton University)
Modular Blind Video Quality Assessment
Wen Wen (City University of Hong Kong) · Mu Li (The Chinese University of Hong Kong, Shenzhen) · Yabin ZHANG (Bytedance) · Yiting Liao (Bytedance) · Junlin Li (ByteDance Inc.) · Li zhang (Bytedance Inc.) · Kede Ma (City University of Hong Kong)
Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions
Yujeong Chae (KAIST) · Hyeonseong Kim (KAIST) · Kuk-Jin Yoon (KAIST)
SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
Jan-Niklas Dihlmann (Eberhard-Karls-Universität Tübingen) · Andreas Engelhardt (University of Tübingen) · Hendrik Lensch (University of Tübingen)
Synergistic Global-space Camera and Human Reconstruction from Videos
Yizhou Zhao (Carnegie Mellon University) · Tuanfeng Y. Wang (None) · Bhiksha Raj (Carnegie Mellon University) · Min Xu (Carnegie Mellon University) · Jimei Yang (Adobe Research) · Chun-Hao P. Huang (Adobe Systems)
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
Honghao Chen (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xiangxiang Chu (MeiTuan) · Renyongjian (University of the Chinese Academy of Sciences) · Xin Zhao (University of Science and Technology Beijing) · Kaiqi Huang (, Institute of automation, Chinese academy of science)
Towards Generalizing to Unseen Domains with Few Labels
Chamuditha Jayanga Galappaththige (Queensland University of Technology) · Sanoojan Baliah (Mohamed bin Zayed University of Artificial Intelligence) · Malitha Gunawardhana (University of Auckland) · Muhammad Haris Khan (None)
Towards Detailed and Robust 3D Clothed Human Reconstruction with High-Frequency and Low-Frequency Information of Parametric Body Models
Yifan Yang (South China University of Technology) · Dong Liu (South China University of Technology) · Shuhai Zhang (South China University of Technology) · Zeshuai Deng (SCUT) · Zixiong Huang (South China University of Technology) · Mingkui Tan (South China University of Technology)
Snapshot Lidar: Fourier embedding of phasors for single-image depth reconstruction
Sarah Friday (Dartmouth College) · Yunzi Shi (Dartmouth College) · Yaswanth Kumar Cherivirala (Univ. of Michigan/NVIDIA) · Vishwanath Saragadam (University of California, Riverside) · Adithya Pediredla (Dartmouth College)
Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line Segments
Yusuke Takimoto (Huawei Technologies Japan K.K.) · Hikari Takehara (Huawei Technologies Japan K.K.) · Hiroyuki Sato (Huawei Technologies Japan K.K.) · Zihao Zhu (Keio University) · Bo Zheng (Huawei Technologies Japan)
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
Ziyao Huang (, Chinese Academy of Sciences) · Fan Tang (Institute of Computing Technology, CAS) · Yong Zhang (Tencent AI Lab) · Xiaodong Cun (Tencent AI Lab) · Juan Cao (Institute of Computing Technology, Chinese Academy of Sciences) · Jintao Li (Institute of Computing Technology, Chinese Academy of Sciences) · Tong-yee Lee (National Cheng Kung University)
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Jihao Liu (The Chinese University of Hong Kong) · Jinliang Zheng (Tsinghua University) · Yu Liu (The Chinese University of Hong Kong) · Hongsheng Li (The Chinese University of Hong Kong)
Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes
YuJie Lu (Donghua University, Shanghai) · Long Wan (Donghua University, Shanghai) · Nayu Ding (Donghua University, Shanghai) · Yulong Wang (Donghua University, Shanghai) · Shuhan Shen (Institute of automation, Chinese academy of science) · Shen Cai (Donghua University) · Lin Gao (University of Chinese Academy of Sciences)
Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
Wonseok Roh (Korea University) · Hwanhee Jung (Korea University) · Giljoo Nam (Meta) · Jinseop Yeom (Korea University) · Hyunje Park (Korea University) · Sang Ho Yoon (KAIST) · Sangpil Kim (Korea University)
In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification
Jinseong Park (Seoul National University) · Yujin Choi (Seoul National University) · Jaewook Lee (Seoul National University)
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen (Huawei) · Dapeng Chen (Huawei Technologies Ltd.) · Ruijin Liu (Xi'an Jiaotong University) · Sai Zhou (Huawei Technologies Ltd.) · Wenyuan Xue (Huawei Technologies Ltd.) · Wei Peng (Huawei Technologies Ltd.)
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi (National University of Singapore) · Zehong Yan (National University of Singapore) · Wynne Hsu (National University of Singapore) · Mong Li Lee (National University of Singapore)
Spatial-Aware Regression for Keypoint Localization
Dongkai Wang (Peking University) · Shiliang Zhang (Peking University)
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang (Yale University) · Chao Feng () · Ziyang Chen (University of Michigan) · Hyoungseob Park (Yale University) · Daniel Wang (Yale University) · Yiming Dou (University of Michigan - Ann Arbor) · Ziyao Zeng (Yale University) · xien chen (Yale University) · Suchisrit Gangopadhyay (Yale University) · Andrew Owens (University of Michigan) · Alex Wong (Yale University)
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Nikita Drobyshev (Meta) · Antoni Bigata Casademunt (Imperial College London) · Konstantinos Vougioukas (Facebook) · Zoe Landgraf (Facebook) · Stavros Petridis (Facebook) · Maja Pantic (Facebook)
Shadow-Enlightened Image Outpainting
Hang Yu (Shanghai University) · Ruilin Li (None) · Shaorong Xie (Shanghai University) · Jiayan Qiu (Univerisity of Leicester)
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang (School of Artificial Intelligence, University of Chinese Academy of Sciences.) · Xing Nie (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Tong Li (Meituan) · Gaopengfei (Beijing SanKuai Online Technology Co., Ltd.) · Ying Guo (Meituan) · Cheng Zhen (Meituan) · Pengfei Yan (Meituan) · Shiming Xiang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval
Yi Xie (South China University of Technology) · Yihong Lin (South China University of Technology) · Wenjie Cai () · Xuemiao Xu (South China University of Technology) · Huaidong Zhang (South China University of Technology) · Yong Du (Ocean University of China) · Shengfeng He (Singapore Management University)
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
Sijia Chen (Huazhong University of Science and Technology) · En Yu (Huazhong University of Science and Technology) · Jinyang Li (Huazhong University of Science and Technology) · Wenbing Tao (Huazhong University of Science and Technology)
Single-View Refractive Index Tomography with Neural Fields
Brandon Zhao (California Institute of Technology) · Aviad Levis (California Institute of Technology) · Liam Connor (California Institute of Technology) · Pratul P. Srinivasan (Google Research) · Katherine Bouman (California Institute of Technology)
MTLoRA: Low-Rank Adaptation Approach for Efficient Multi-Task Learning
Ahmed Agiza (None) · Marina Neseem (Brown University) · Sherief Reda (Brown University)
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion
Zuoyue Li (ETH Zürich) · Zhenqiang Li (The University of Tokyo) · Zhaopeng Cui (None) · Marc Pollefeys (ETH Zurich / Microsoft) · Martin R. Oswald (University of Amsterdam)
Adaptive Softassign via Hadamard-Equipped Sinkhorn
Binrui Shen (Xi'an Jiaotong-Liverpool University) · Qiang Niu (Xi'an Jiaotong-Liverpool University) · Shengxin Zhu (Beijing Normal Unversity)
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras (NVIDIA) · Miika Aittala (NVIDIA) · Jaakko Lehtinen (Aalto University & NVIDIA) · Janne Hellsten (NVIDIA) · Timo Aila (NVIDIA) · Samuli Laine (NVIDIA)
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han (The Chinese University of Hong Kong) · Kaixiong Gong (None) · Yiyuan Zhang (The Chinese University of Hong Kong) · Jiaqi Wang (Shanghai AI Laboratory) · Kaipeng Zhang (Shanghai AI Laboratory) · Dahua Lin (The Chinese University of Hong Kong) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Peng Gao (The Chinese University of Hong Kong) · Xiangyu Yue (The Chinese University of Hong Kong)
See, Say, and Segment: Correcting False Premises with LMMs
Tsung-Han Wu (University of California, Berkeley) · Giscard Biamby (University of California, Berkeley) · David Chan (University of California Berkeley) · Lisa Dunlap (University of California, Berkeley) · Ritwik Gupta (Defense Innovation Unit) · Xudong Wang (Electrical Engineering & Computer Science Department, University of California Berkeley) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Joseph Gonzalez (University of California - Berkeley)
Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection
Suyeon Kim (Pohang University of Science and Technology) · Dongha Lee (Yonsei University) · SeongKu Kang (University of Illinois Urbana-Champaign) · Sukang Chae (Pohang University of Science and Technology) · Sanghwan Jang (POSTECH) · Hwanjo Yu (POSTECH)
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
Yu Zhang (Shanghai Jiaotong University) · Songpengcheng Xia () · Lei Chu (University of Southern California) · Jiarui Yang (Shanghai Jiaotong University) · Qi Wu (Shanghai Jiaotong University) · Ling Pei (Shanghai Jiao Tong Univeristy)
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Junyi Ma (Shanghai Jiao Tong University) · Xieyuanli Chen (National University of Defense Technology) · Jiawei Huang (HAOMO Technology Co., Ltd) · Jingyi Xu (Beijing Institute of Technology) · Zhen Luo (Beijing Institute of Technology) · Jintao Xu (Xi'an Jiaotong University) · Weihao Gu (Tsinghua University, Tsinghua University) · Rui Ai (HAOMO.AI Technology Co.,Ltd. ) · Hesheng Wang (Shanghai Jiao Tong University)
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz (Technion - Israel Institute of Technology, Technion) · Yair Kittenplon (AWS AI Labs) · Aviad Aberdam (Amazon AWS AI) · Elad Ben Avraham (Amazon) · Oren Nuriel (Amazon) · Shai Mazor (Amazon) · Ron Litman (Amazon AI Labs)
Rethinking Generalizable Face Anti-spoofing via Hierarchical Prototype-guided Distribution Refinement in Hyperbolic Space
Chengyang Hu (Shanghai Jiao Tong University) · Ke-Yue Zhang (Tencent) · Taiping Yao (Tencent Youtu Lab) · Shouhong Ding (Tencent Youtu Lab) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
Towards Efficient Replay in Federated Incremental Learning
Yichen Li (Huazhong University of Science and Technology) · Qunwei Li (Ant Group) · Haozhao Wang (Huazhong University of Science and Technology) · Ruixuan Li (Huazhong University of Science and Technology) · Wenliang Zhong (Ant Group) · Guannan Zhang (Tongji University)
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Alexandros Delitzas (ETH Zurich) · Ayça Takmaz (None) · Federico Tombari (Google, TUM) · Robert Sumner (Massachusetts Institute of Technology) · Marc Pollefeys (ETH Zurich / Microsoft) · Francis Engelmann (Department of Computer Science, ETHZ - ETH Zurich)
Effective Video Mirror Detection with Inconsistent Motion Cues
Alex Warren (Swansea University) · Ke Xu (City University of Hong Kong) · Jiaying Lin (City University of Hong Kong) · Gary Tam (Swansea University) · Rynson W.H. Lau (City University of Hong Kong)
Desigen: A Pipeline for Controllable Design Template Generation
Haohan Weng (South China University of Technology) · Danqing Huang (Microsoft) · YU QIAO (Central South University) · Hu Zheng (Keio University, Tokyo Institute of Technology) · Chin-Yew Lin (Microsoft) · Tong Zhang (South China University of Technology) · C. L. Philip Chen (South China University of Technology)
ControlRoom3D: Room Generation using Semantic Controls
Jonas Schult (Rheinisch Westfälische Technische Hochschule Aachen) · Sam Tsai (Meta) · Lukas Hoellein (None) · Bichen Wu (Facebook) · Jialiang Wang (Facebook) · Chih-Yao Ma (Facebook) · Kunpeng Li (Meta) · Xiaofang Wang (Meta) · Felix Wimbauer (Technical University of Munich) · Zijian He (None) · Peizhao Zhang (Facebook) · Bastian Leibe (RWTH Aachen University) · Peter Vajda (Facebook) · Ji Hou (Facebook)
LAN: Learning to Adapt Noise for Image Denoising
Changjin Kim (Hanyang University) · Tae Hyun Kim (Hanyang Univ.) · Sungyong Baik (Hanyang University)
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Kelvin C.K. Chan (Google DeepMind) · Yang Zhao (Google) · Xuhui Jia (Google) · Ming-Hsuan Yang (University of California at Merced) · Huisheng Wang (Google)
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li (Peking University) · Yiming Qin (Peking University) · Minghang Zheng (Peking University) · Xin Jin (Beijing Electronic Science and Technology Institute) · Yang Liu (Peking University)
Restricted Memory Banks Improve Video Object Segmentation: A Revisit
Junbao Zhou () · Ziqi Pang (UIUC) · Yu-Xiong Wang (None)
DiaLoc: An Iterative Approach to Embodied Dialog Localization
Chao Zhang (Toshiba Europe Ltd) · Mohan Li (Toshiba Europe Ltd) · Ignas Budvytis (University of Cambridge) · Stephan Liwicki (Toshiba Europe Ltd)
Artist-Friendly Relightable and Animatable Neural Heads
Yingyan Xu (Department of Computer Science, ETHZ - ETH Zurich) · Prashanth Chandran (None) · Sebastian Weiss (DisneyResearch|Studios) · Markus Gross (Disney Research, Disney) · Gaspard Zoss (Disney Research, Disney) · Derek Bradley (DisneyResearch|Studios)
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Thuan Nguyen (VinAI Research) · Anh Tran (VinAI Research)
AVID: Any-Length Video Inpainting with Diffusion Model
Zhixing Zhang (Rutgers University) · Bichen Wu (Facebook) · Xiaoyan Wang (Massachusetts Institute of Technology) · Yaqiao Luo (Facebook) · Luxin Zhang (Meta) · Yinan Zhao (Facebook) · Peter Vajda (Facebook) · Dimitris N. Metaxas (Rutgers) · Licheng Yu (None)
Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning
Hao Xiong (Shanghai Jiao Tong University) · Yehui Tang (Shanghai Jiaotong University) · Xinyu Ye (Shanghai Jiaotong University) · Junchi Yan (Shanghai Jiao Tong University)
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
Moreno D'Incà (University of Trento) · Elia Peruzzo (University of Trento) · Massimiliano Mancini (University of Trento) · Dejia Xu (University of Texas at Austin) · Vidit Goel (Georgia Tech | UIUC / Oregon | PAIR) · Xingqian Xu (University of Illinois, Urbana Champaign) · Zhangyang Wang (University of Texas at Austin) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR) · Nicu Sebe (University of Trento)
Shadows Don’t Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Ayush Sarkar (Department of Computer Science at University of Illinois Urbana-Champaign) · Hanlin Mai (University of Illinois Urbana Champaign) · Amitabh Mahapatra (University of Illinois Urbana-Champaign) · David Forsyth (University of Illinois at Urbana-Champaign) · Svetlana Lazebnik (University of Illinois at Urbana-Champaign) · Anand Bhattad (None)
Neural Implicit Morphing of Face Images
Guilherme Schardong (Institute of Systems and Robotics, University of Coimbra) · Tiago Novello (IMPA) · Hallison Paz (IMPA) · Iurii Medvedev (Institute of Systems and Robotics, University of Coimbra) · Vinícius Silva (PUC-Rio) · Luiz Velho (IMPA) · Nuno Gonçalves (University of Coimbra)
GDA: Generalized Diffusion for Robust Test-time Adaptation
Yun-Yun Tsai (Columbia University) · Fu-Chen Chen (Amazon Lab126) · Albert Chen (Amazon) · Junfeng Yang (Columbia University) · Che-Chun Su (Amazon) · Min Sun (Amazon/NTHU) · Cheng-Hao Kuo (Amazon)
Permutation Equivariance of Transformers and Its Applications
Hengyuan Xu (Shanghai Jiao Tong University) · Liyao Xiang (Shanghai Jiao Tong University) · Hangyu Ye (Shanghai Jiaotong University) · Dixi Yao (University of Toronto) · Pengzhi Chu (Shanghai Jiaotong University) · Baochun Li (University of Toronto)
Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds
Zhimin Yuan (School of Informatics Xiamen University) · Wankang Zeng (Xiamen University) · Yanfei Su (Xiamen University) · Weiquan Liu (Xiamen University) · Ming Cheng (Xiamen University) · Yulan Guo (SUN YAT-SEN UNIVERSITY) · Cheng Wang (Xiamen University)
SubT-MRS Datasets: Pushing SLAM Towards All-weather Environments
Shibo Zhao (Carnegie Mellon University) · Yuanjun Gao (Carnegie Mellon University) · Tianhao Wu (University of Virginia, Charlottesville) · Damanpreet Singh (CMU, Carnegie Mellon University) · Rushan Jiang (Oracle) · Haoxiang Sun (Carnegie Mellon University) · Mansi Sarawata (CMU, Carnegie Mellon University) · Warren Whittaker (Carnegie Mellon University) · Ian Higgins (Carnegie Mellon University) · Shaoshu Su (State University of New York at Buffalo) · Yi Du (State University of New York at Buffalo) · Can Xu (None) · John Keller (Carnegie Mellon University) · Jay Karhade (Carnegie Mellon University) · Lucas Nogueira (Carnegie Mellon University) · Sourojit Saha (CMU, Carnegie Mellon University) · Yuheng Qiu (CMU, Carnegie Mellon University) · Ji Zhang (Carnegie Mellon University) · Wenshan Wang (School of Computer Science, Carnegie Mellon University) · Chen Wang (University at Buffalo) · Sebastian Scherer (None)
SpecNeRF: Gaussian Directional Encoding for Specular Reflections
Li Ma (None) · Vasu Agrawal (Meta Reality Labs Research) · Haithem Turki (Carnegie Mellon University) · Changil Kim (Facebook) · Chen Gao (Meta) · Pedro V. Sander (Hong Kong University of Science and Technology) · Michael Zollhoefer (Meta) · Christian Richardt (Meta Reality Labs)
Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao (Rutgers University, New Brunswick) · Long Zhao (Google Research) · Vijay Kumar BG (NEC Laboratories America) · Yumin Suh (NEC Labs America) · Dimitris N. Metaxas (Rutgers) · Manmohan Chandraker (UC San Diego) · Samuel Schulter (NEC Laboratories America)
SLICE: Stabilized LIME for Consistent Explanations for Image Classification
Revoti Prasad Bora (Norwegian University of Science and Technology) · Kiran Raja (Norwegian University of Science and Technology) · Philipp Terhörst (Paderborn University, Germany) · Raymond Veldhuis (University of Twente) · Raghavendra Ramachandra (Norwegian University of Science and Technology (NTNU))
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen (SketchX) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Subhadeep Koley (University of Surrey) · Aneeshan Sain (University of Surrey) · Pinaki Nath Chowdhury (University of Surrey) · Yi-Zhe Song (None)
SPIDeRS: Structured Polarization for Invisible Depth and Reflectance Sensing
Tomoki Ichikawa (Kyoto University) · Shohei Nobuhara (Kyoto Institute of Technology) · Ko Nishino (Kyoto University)
Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking
Cheng-Yao Hong (Academia Sinica) · Yen-Chi Hsu (Department of computer science and informational engineering, National Taiwan University) · Tyng-Luh Liu (IIS/Academia Sinica)
UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion
Junsheng Zhou (Tsinghua University) · Weiqi Zhang (Tsinghua University) · Baorui Ma (BAAI) · Kanle Shi (Kuaishou Technology) · Yu-Shen Liu (None) · Zhizhong Han (Wayne State University)
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jialong Zuo (Huazhong University of Science and Technology) · Hanyu Zhou (Huazhong University of Science and Technology) · Ying Nie (Huawei Noah's Ark Lab) · Feng Zhang (Huazhong University of Science and Technology) · Tianyu Guo (Peking University) · Nong Sang (Huazhong University of Science and Technology) · Yunhe Wang (Huawei Noah's Ark Lab) · Changxin Gao (Huazhong University of Science and Technology)
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Ka Chun SHUM (The Hong Kong University of Science and Technology) · Jaeyeon Kim (Hong Kong University of Science and Technology) · Binh-Son Hua (Trinity College Dublin) · Thanh Nguyen (Deakin University) · Sai-Kit Yeung (The Hong Kong University of Science and Technology (HKUST))
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang (Tsinghua University) · Xin Wang (None) · Hong Chen (None) · Zihan Song (Tsinghua University, Tsinghua University) · Wenwu Zhu (Tsinghua University, Tsinghua University)
HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
Yongliang Lin (Zhejiang University) · Yongzhi Su (German Research Center for AI (DFKI)) · Praveen Nathan (German Research Center for AI) · Sandeep Inuganti (German Research Center for AI) · Yan Di (Technische Universität München) · Martin Sundermeyer (None) · Fabian Manhardt (Google) · Didier Stricker (Universität Kaiserslautern) · Jason Rambach (None) · Yu Zhang (Zhejiang University)
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
Xiaohan Lei () · Min Wang (Institute of Artificial Intelligence, Hefei Comprehensive National Science Center) · Wengang Zhou (University of Science and Technology of China) · Li Li (University of Science and Technology of China) · Houqiang Li (University of Science and Technology of China)
AnyScene: Customized Image Synthesis with Composited Foreground
Ruidong Chen (Tianjin University) · Lanjun Wang (Tianjin University) · Weizhi Nie (Tianjin University) · Yongdong Zhang (University of Science and Technology of China) · An-An Liu (Tianjin University)
PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis
Zhengyao Lv (University of Hong Kong) · Yuxiang Wei (The Hong Kong Polytechnic University, Hong Kong Polytechnic University) · Wangmeng Zuo (Harbin Institute of Technology) · Kwan-Yee K. Wong (The University of Hong Kong)
TE-TAD: Towards Fully End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
Ho-Joong Kim (Korea University) · Jung-Ho Hong (Korea University) · Heejo Kong (Korea University) · Seong-Whan Lee (Korea University)
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Yanhui Wang (None) · Jianmin Bao (Microsoft) · Wenming Weng (None) · Ruoyu Feng (University of Science and Technology of China) · Dacheng Yin (University of Science and Technology of China) · Tao Yang (Xi'an JiaoTong University) · Jingxu Zhang (Research, Microsoft) · Qi Dai (Microsoft Research Asia) · Zhiyuan Zhao (Microsoft) · Chunyu Wang (Microsoft) · Kai Qiu (Microsoft) · Yuhui Yuan (Microsoft Research Asia) · Xiaoyan Sun (University of Science and Technology of China) · Chong Luo (Microsoft Research Asia) · Baining Guo (Microsoft Research)
Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions
Saeed Khorram (Apple) · Mingqi Jiang (Oregon State University) · Mohamad Shahbazi (ETH Zürich) · Mohamad Hosein Danesh (McGill University) · Li Fuxin (Oregon State University)
TRINS: Towards Multimodal Language Models That Can Read
Ruiyi Zhang (Adobe Research) · Yanzhe Zhang (Georgia Institute of Technology) · Jian Chen (Mohamed bin Zayed University of Artificial Intelligence) · Yufan Zhou (State University of New York, Buffalo) · Jiuxiang Gu (Adobe Systems) · Changyou Chen (State University of New York, Buffalo) · Tong Sun (Adobe Systems)
MorpheuS: Neural Dynamic 360$^{\circ}$ Surface Reconstruction from Monocular RGB-D Video
Hengyi Wang (University College London) · Jingwen Wang (University College London) · Lourdes Agapito (University College London)
A Unified and Interpretable Emotion Representation and Expression Generation
Reni Paskaleva (First Private Mathematical High School, Sofia, Bulgaria) · Mykyta Holubakha (INSAIT) · Andela Ilic (ETHZ - ETH Zurich) · Saman Motamed (INSAIT, Sofia University) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Danda Paudel (INSAIT, Sofia University)
One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
Minghui Hu (Nanyang Technological University) · Jianbin Zheng (South China University of Technology) · Chuanxia Zheng (University of Oxford) · Chaoyue Wang (JD Explore Academy) · Dacheng Tao (None) · Tat-Jen Cham (Nanyang Technological University)
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu (University of Wisconsin-Madison) · SICHENG MO (University of California, Los Angeles) · Yin Li (University of Wisconsin, Madison)
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
Xinyu Tian (Australian National University) · Shu Zou (Australian National University) · Zhaoyuan Yang (General Electric) · Jing Zhang (Australian National University)
LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging
Haoyang Ge (Tianjin University) · Qiao Feng (None) · Hailong Jia (Tianjin University) · Xiongzheng Li (None) · Xiangjun Yin (None) · You Zhou (Nanjing University) · Jingyu Yang (Tianjin University) · Kun Li (None)
6-DoF Pose Estimation with MultiScale Residual Correlation
Yuelong Li (Amazon) · Yafei Mao (Amazon) · Raja Bala (Amazon) · Sunil Hadap (Amazon)
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
Jongwoo Choi (Visual Media Lab, KAIST) · Kwanggyoon Seo (KAIST) · Amirsaman Ashtari (MD Anderson Cancer Center) · Junyong Noh (Korea Advanced Institute of Science and Technology)
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion
Xiaoyu Wu (Shanghai Jiaotong University) · Yang Hua (Queen's University Belfast) · Chumeng Liang (University of Southern California) · Jiaru Zhang (Shanghai Jiao Tong University) · Hao Wang (Louisiana State University) · Tao Song (Shanghai Jiao Tong University) · Haibing Guan (Shanghai Jiaotong University)
Robust Depth Enhancement via Polarization Prompt Fusion Tuning
Kei IKEMURA (KTH Royal Institute of Technology) · Yiming Huang (HKUST) · Felix Heide (Department of Computer Science, Princeton University) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Qifeng Chen (Hong Kong University of Science and Technology) · Chenyang Lei (The Hong Kong University of Science and Technology)
Learning to Predict Activity Progress by Self-Supervised Video Alignment
Gerard Donahue (Northeastern University) · Ehsan Elhamifar (None)
PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion
Ying-Tian Liu (Tsinghua University, Tsinghua University) · Yuan-Chen Guo (Tsinghua University) · Guan Luo (Tsinghua University, Tsinghua University) · Heyi Sun (Tsinghua University, Tsinghua University) · Wei Yin ( Shenzhen DJI Sciences and Technologies Ltd.) · Song-Hai Zhang (Tsinghua University, Tsinghua University)
Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
Tianming Liang (Sun Yat-sen University) · Chaolei Tan (SUN YAT-SEN UNIVERSITY) · Beihao Xia (Huazhong University of Science and Technology) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY) · Jian-Fang Hu (SUN YAT-SEN UNIVERSITY)
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Xiaoyu Zhou (Peking University) · Zhiwei Lin (Peking University) · Xiaojun Shan (Peking Univerisity) · Yongtao Wang (Peking University) · Deqing Sun (Google) · Ming-Hsuan Yang (University of California at Merced)
Interactive3D: Create What You Want by Interactive 3D Generation
Shaocong Dong (Hong Kong University of Science and Technology) · Lihe Ding (The Chinese University of Hong Kong) · Zhanpeng Huang (SenseTime Research) · Zibin Wang (Sensetime Group Limited) · Tianfan Xue (The Chinese University of Hong Kong) · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology)
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
Leonardo Iurada (Polytechnic Institute of Turin) · Marco Ciccone (Politecnico di Torino) · Tatiana Tommasi (Politecnico di Torino)
CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image
Donggeun Yoon (Chungnam National University / KETI) · Donghyeon Cho (Hanyang University)
Amodal Ground Truth and Completion in the Wild
Guanqi Zhan (VGG, University of Oxford) · Chuanxia Zheng (University of Oxford) · Weidi Xie (Shanghai Jiaotong University) · Andrew Zisserman (University of Oxford)
MiKASA: Multi-Key-Anchor Scene-Aware Transformer for 3D Visual Grounding
Chun-Peng Chang (DFKI) · Shaoxiang Wang (German Research Center for AI) · Alain Pagani (German Research Center for Artificial Intelligence (DFKI)) · Didier Stricker (Universität Kaiserslautern)
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu (Shanghai Jiaotong University) · Haoning Wu (Shanghai Jiao Tong University) · Yujie Zhong (Meituan Inc.) · Xiaoyun Zhang (Shanghai Jiao Tong University) · Yanfeng Wang (Shanghai Jiao Tong University) · Weidi Xie (Shanghai Jiaotong University)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling
Jianan Fan (University of Sydney) · Dongnan Liu (University of Sydney) · Hang Chang (Lawrence Berkeley National Lab) · Heng Huang (University of Pittsburgh) · Mei Chen () · Weidong Cai (The University of Sydney)
Real-Time Exposure Correction via Collaborative Transformations and Adaptive Sampling
Ziwen Li (Huazhong University of Science and Technology) · Feng Zhang (Huazhong University of Science and Technology) · Meng Cao (Mohamed bin Zayed University of Artificial Intelligence) · Jinpu Zhang (Huazhong University of Science and Technology) · Yuanjie Shao (Huazhong University of Science and Technology) · Yuehuan Wang (Huazhong University of Science and Technology) · Nong Sang (Huazhong University of Science and Technology)
Gaussian Splatting SLAM
Hidenobu Matsuki (Imperial College London) · Riku Murai (Imperial College London) · Paul Kelly (Imperial College London) · Andrew J. Davison (Imperial College London)
A Simple Baseline for Efficient Hand Mesh Reconstruction
zhishan zhou (None) · shihao zhou (None) · Zhi Lv (None) · minqiang zou (None) · Yao Tang (None) · Jiajun Liang (None)
EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits Matting
Zitao Wang (None) · Qiguang Miao (Xidian University) · Yue Xi (Xi'an University of Electronic Science and Technology) · Peipei Zhao (Xi'an University of Electronic Science and Technology)
Privacy-preserving Optics for Enhancing Protection in Face De-identification
Jhon Lopez (Universidad Industrial de Santander) · Carlos Hinojosa (KAUST) · Henry Arguello (Universidad Industrial de Santander) · Bernard Ghanem (KAUST)
BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks
Shangqian Gao (University of Pittsburgh) · Yanfu Zhang (College of William and Mary) · Feihu Huang (Nanjing University of Aeronautics and Astronautics) · Heng Huang (University of Pittsburgh)
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Jiayi Guo (Tsinghua University, Tsinghua University) · Xingqian Xu (University of Illinois, Urbana Champaign) · Yifan Pu (Tsinghua University, Tsinghua University) · Zanlin Ni (Tsinghua University) · Chaofei Wang (Tsinghua University, Tsinghua University) · Manushree Vasu (Georgia Institute of Technology) · Shiji Song (Tsinghua University, Tsinghua University) · Gao Huang (Tsinghua University, Tsinghua University) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Yabin Zhang (The Hong Kong Polytechnic University) · Wenjie Zhu (None) · Hui Tang (Hong Kong University of Science and Technology) · Zhiyuan Ma (None) · Kaiyang Zhou (Hong Kong Baptist University) · Lei Zhang (The Hong Kong Polytechnic University)
A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization
Hongwei Ren (Hong Kong University of Science and Technology) · Jiadong Zhu (The Hong Kong University of Science and Technology (Guangzhou)) · Yue Zhou (Hong Kong University of Science and Technology) · Haotian FU (Hong Kong University of Science and Technology) · Yulong Huang (Central South University) · Bojun Cheng (Hong Kong University of Science and Technology)
Transductive Zero-Shot $\&$ Few-Shot CLIP
Ségolène Martin (TU Berlin) · Yunshi HUANG (École de technologie supérieure, Université du Québec) · Fereshteh Shakeri (École de technologie supérieure) · Jean-Christophe Pesquet (CentraleSupelec) · Ismail Ben Ayed (ETS Montreal)
LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
Yunshi HUANG (École de technologie supérieure, Université du Québec) · Fereshteh Shakeri (École de technologie supérieure) · Jose Dolz (École de technologie supérieure) · Malik Boudiaf (École de technologie supérieure) · Houda Bahig (University of Montreal) · Ismail Ben Ayed (ETS Montreal)
Estimating Extreme 3D Image Rotations using Cascaded Attention
Shay Dekel (Bar Ilan University) · Yosi Keller (Bar Ilan University) · Martin Čadík (Brno University of Technology)
ReCoRe: Regularized Contrastive Representation Learning of World Model
Rudra P,K. Poudel (Toshiba Europe Ltd) · Harit Pandya (Toshiba Europe) · Stephan Liwicki (Toshiba Europe Ltd) · Roberto Cipolla (University of Cambridge)
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Zirui Wang (Princeton University) · Zhizhou Sha (Tsinghua University, Tsinghua University) · Zheng Ding (University of California, San Diego) · Yilin Wang (Tsinghua University, Tsinghua University) · Zhuowen Tu (University of California, San Diego)
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi (Korea University) · Sanghyeok Lee (Korea University) · Jaewon Chu (Korea University) · Minhyuk Choi (Korea University) · Hyunwoo J. Kim (Korea University)
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee (Korea University) · Joonmyung Choi (Korea University) · Hyunwoo J. Kim (Korea University)
Unmixing before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis
Yang Yu (None) · Erting Pan (Wuhan University) · Xinya Wang (Wuhan University) · Yuheng Wu (Wuhan University) · Xiaoguang Mei (Wuhan University) · Jiayi Ma (Wuhan University)
Referring Expression Counting
Siyang Dai (Singapore University of Technology and Design) · Jun Liu () · Ngai-Man Cheung (Singapore University of Technology and Design)
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick (None) · Guangxing Han (Columbia University) · Rui Hou (Meta Inc. ) · Sayan Nag (University of Toronto) · Ser-Nam Lim (Meta AI) · Nicolas Ballas (Facebook) · Qifan Wang (Meta AI) · Rama Chellappa (Johns Hopkins University) · Amjad Almahairi (Facebook)
Few-Shot Object Detection with Foundation Models
Guangxing Han (Columbia University) · Ser-Nam Lim (Meta AI)
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
Ziyue Feng (Clemson University) · Huangying Zhan (OPPO US Research Center) · Zheng Chen (Indiana University, Bloomington) · Qingan Yan (OPPO US Research Center) · Xiangyu Xu (None) · Changjiang Cai (None) · Bing Li (Clemson University) · Qilun Zhu (Clemson University) · Yi Xu (OPPO US Research Center)
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Tianrui Lou (None) · Xiaojun Jia (, Chinese Academy of Sciences) · Jindong Gu (University of Oxford & Google Research) · Li Liu (University of Oulu) · Siyuan Liang (National University of Singapore) · Bangyan He (Institute of Information Engineering, CAS) · Xiaochun Cao (SUN YAT-SEN UNIVERSITY)
CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation
Townim Chowdhury (None) · Kewen Liao (Australian Catholic University) · Vu Minh Hieu Phan (University of Adelaide) · Minh-Son To (Flinders University of South Australia) · Yutong Xie (University of Adelaide) · Kevin Hung (Royal Adelaide Hospital) · David Ross (University of South Australia) · Anton van den Hengel (University of Adelaide) · Johan Verjans (University of Adelaide) · Zhibin Liao (University of Adelaide)
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
Vu Minh Hieu Phan (University of Adelaide) · Yutong Xie (University of Adelaide) · Yuankai Qi (The University of Adelaide) · Lingqiao Liu (None) · Liyang Liu (University of Adelaide) · Bowen Zhang (The University of Adelaide) · Zhibin Liao (University of Adelaide) · Qi Wu (University of Adelaide) · Minh-Son To (Flinders University of South Australia) · Johan Verjans (University of Adelaide)
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
Yutong Xie (University of Adelaide) · Qi Chen (The University of Adelaide) · Sinuo Wang (University of Adelaide) · Minh-Son To (Flinders University of South Australia) · Iris Lee (South Australia medical imaging) · Ee Win Khoo (The Queen Elizabeth Hospital) · Kerolos Hendy (Flinders University of South Australia) · Daniel Koh (Monash University, Malaysia Campus) · Yong Xia (Northwestern Polytechnical University) · Qi Wu (University of Adelaide)
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
Sadeep Jayasumana (Google) · Daniel Glasner (Google) · Srikumar Ramalingam (Google) · Andreas Veit (Google) · Ayan Chakrabarti (Google) · Sanjiv Kumar (Google)
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie Yang (The Chinese University of Hong Kong, Shenzhen) · Bingliang Li (The Chinese University of Hong Kong (Shenzhen)) · Ailing Zeng (IDEA) · Lei Zhang (International Digital Economy Academy (IDEA)) · Ruimao Zhang (The Chinese University of Hong Kong (Shenzhen))
Spectrum AUC Difference (SAUCD): Human Aligned 3D Shape Evaluation
Tianyu Luan (State University of New York at Buffalo) · Zhong Li (InnoPeak Technology) · Lele Chen (Sony America) · Xuan Gong (Harvard University) · Lichang Chen (Department of Computer Science, University of Maryland, College Park) · Yi Xu (OPPO US Research Center) · Junsong Yuan (State University of New York at Buffalo)
FreeMan: Towards benchmarking 3D human pose estimation under Real-World Conditions
Jiong WANG (Fudan University) · Fengyu Yang (Chinese University of Hong Kong(Shenzhen)) · Bingliang Li (The Chinese University of Hong Kong (Shenzhen)) · Wenbo Gou (Carnegie Mellon University) · Danqi Yan (The Chinese University of Hong Kong Shenzhen) · Ailing Zeng (IDEA) · Yijun Gao (Tencent Turing Lab) · Junle Wang (Tencent) · Yanqing Jing (Tencent) · Ruimao Zhang (The Chinese University of Hong Kong (Shenzhen))
BodyMAP - Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed
Abhishek Tandon (Carnegie Mellon University) · Anujraaj Goyal (Carnegie Mellon University) · Henry M. Clever (NVIDIA) · Zackory Erickson (Carnegie Mellon University)
AssistGUI: Task-Oriented PC Graphical User Interface Automation
Difei Gao (None) · Lei Ji (Research, Microsoft) · Zechen Bai (Show Lab, National University of Singapore) · Mingyu Ouyang (National University of Singaore) · Peiran Li (national university of singaore, National University of Singapore) · Dongxing Mao (SUTD) · Qin WU (National University of Singapore) · Weichen Zhang (National University of Singapore) · Peiyi Wang (national university of singaore, National University of Singapore) · Xiangwu Guo (South China University of Technology) · Hengxu Wang (national university of singaore, National University of Singapore) · Luowei Zhou (Google) · Mike Zheng Shou (National University of Singapore)
Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization
Takuhiro Kaneko (NTT Corporation)
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang (University of Amsterdam) · Hazel Doughty (Leiden University) · Cees G. M. Snoek (University of Amsterdam)
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
Vidit Goel (Georgia Tech | UIUC / Oregon | PAIR) · Elia Peruzzo (University of Trento) · Yifan Jiang (University of Texas at Austin) · Dejia Xu (University of Texas at Austin) · Xingqian Xu (University of Illinois, Urbana Champaign) · Nicu Sebe (University of Trento) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Zhangyang Wang (University of Texas at Austin) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
Utility-Fairness Trade-Offs and How to Find Them
Sepehr Dehdashtian (Michigan State University) · Bashir Sadeghi (Michigan State University) · Vishnu Naresh Boddeti (None)
Learning Continuous 3D Words for Text-to-Image Generation
Ta-Ying Cheng (Department of Computer Science, University of Oxford) · Matheus Gadelha (Adobe Systems) · Thibault Groueix (Adobe Systems) · Matthew Fisher (Adobe Research) · Radomir Mech (University of Calgary) · Andrew Markham (University of Oxford) · Niki Trigoni (University of Oxford)
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue (Ohio State University) · Yuansheng Ni (University of Waterloo) · Kai Zhang (Ohio State University, Columbus) · Tianyu Zheng (Beijing University of Posts and Telecommunications) · Ruoqi Liu (Ohio State University) · Ge Zhang (University of Waterloo) · Samuel Stevens (Ohio State University, Columbus) · Dongfu Jiang (University of Waterloo) · Weiming Ren (University of Waterloo) · Yuxuan Sun (Westlake University) · Cong Wei (University of Waterloo) · Botao Yu (The Ohio State University) · Ruibin Yuan (Hong Kong University of Science and Technology) · Renliang Sun (International Digital Economy Academy) · Ming Yin (Princeton University) · Boyuan Zheng (Ohio State University, Columbus) · Zhenzhu Yang (China University of Geoscience Beijing) · Yibo Liu (University of Victoria) · Wenhao Huang (BAAI) · Huan Sun (Ohio State University, Columbus) · Yu Su (Ohio State University) · Wenhu Chen (University of Waterloo)
A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling
Qu Wentao (Nanjing University of Science and Technology) · Yuantian Shao (Nanjing University of Science and Technology) · Lingwu Meng (Nanjing University of Science and Technology) · Xiaoshui Huang (Shanghai AI Laboratory) · Liang Xiao (Nanjing University of Science and Technology)
Efficient Solution of Point-Line Absolute Pose
Petr Hruby (Department of Computer Science, ETHZ - ETH Zurich) · Timothy Duff (University of Washington) · Marc Pollefeys (ETH Zurich / Microsoft)
CAMixerSR: Only Details Need More "Attention"
Yan Wang (Nankai University) · Yi Liu (ByteDance Inc.) · Shijie Zhao (ByteDance Inc.) · Junlin Li (ByteDance Inc.) · Li zhang (Bytedance Inc.)
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He () · Yiheng Deng (Zhejiang University) · SHIXIANG TANG (The Chinese University of Hong Kong) · Qihao CHEN (Liaoning Technical University) · Qingsong Xie (OPPO) · Yizhou Wang (None) · Lei Bai (Shanghai AI Laboratory) · Feng Zhu (SenseTime Group LTD) · Rui Zhao (Qing Yuan Research Institute, Shanghai Jiao Tong University) · Wanli Ouyang (University of Sydney) · Donglian Qi (Zhejiang University) · Yunfeng Yan (Zhejiang University)
CoDe: An Explicit Content Decoupling Framework for Image Restoration
Enxuan Gu (Dalian University of Technology) · Hongwei Ge (Dalian University of Technology) · Yong Guo (Max-Planck Institute for Informatics)
SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
Ioannis Kakogeorgiou (National Technical University of Athens) · Spyros Gidaris (Valeo.ai) · Konstantinos Karantzalos (IMIS - "Athena" Research Center) · Nikos Komodakis (University of Crete)
ToNNO: Tomographic Reconstruction of a Neural Network’s Output for Weakly Supervised Segmentation of 3D Medical Images
Marius Schmidt-Mengin (None) · Alexis Benichoux (INRIA) · Shibeshih Belachew (Therapanacea) · Nikos Komodakis (University of Crete) · Nikos Paragios (Ecole Centrale de Paris)
Physics-aware Hand-object Interaction Denoising
Haowen Luo (Tsinghua University, Tsinghua University) · Yunze Liu (None) · Li Yi ()
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu (Shanghai Jiao Tong University) · Xintao Lv (Shanghai Jiaotong University) · Yichao Yan (Shanghai Jiao Tong University) · Xin Jin (Eastern Institute of Technology, Ningbo) · Wu Shuwen (Shanghai Jiaotong University) · Congsheng Xu (Shanghai Jiaotong University) · Yifan Liu (Shanghai Jiao Tong University) · Yizhou Zhou (WeChat AI) · Fengyun Rao (WeChat, Tencent Inc.) · Xingdong Sheng (Shanghai Jiaotong University) · Yunhui LIU (Lenovo Research) · Wenjun Zeng (None) · Xiaokang Yang (Shanghai Jiao Tong University, China)
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
Yijun Yang (University of Technology Sydney) · Tianyi Zhou (University of Maryland, College Park) · kanxue Li (Yunnan University) · Dapeng Tao (Yunnan University) · Lusong Li (JDT) · Li Shen (JD Explore Academy) · Xiaodong He (JD AI Research) · Jing Jiang (University of Technology Sydney) · Yuhui Shi (Southern University of Science and Technology)
One-Shot Open Affordance Learning with Foundation Models
Gen Li (University of Edinburgh) · Deqing Sun (Google) · Laura Sevilla-Lara (University of Edinburgh) · Varun Jampani (Google Research)
Self-Supervised Dual Contouring
Ramana Sundararaman (École Polytechnique) · Roman Klokov (École Polytechnique) · Maks Ovsjanikov (Ecole Polytechnique, France)
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman (Oregon State University) · Stefan Lee (Oregon State University) · Scott Cohen (Adobe Systems) · Kushal Kafle (Adobe Systems)
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Hong Huang (City University of Hong Kong) · Weiming Zhuang (Sony Research) · Chen Chen (Sony AI) · Lingjuan Lyu (Sony AI)
C$^\text{2}$RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction
Yiqun Lin (The Hong Kong University of Science and Technology) · Jiewen Yang (Hong Kong University of Science and Technology) · hualiang wang (HKUST) · Xinpeng Ding (The Hong Kong University of Science and Technology) · Wei Zhao (Beijing University of Aeronautics and Astronautics) · Xiaomeng Li (The Hong Kong University of Science and Technology)
Tactile-Augmented Radiance Fields
Yiming Dou (University of Michigan - Ann Arbor) · Fengyu Yang (Yale University) · Yi Liu (University of Michigan - Ann Arbor) · Antonio Loquercio (University of California, Berkeley) · Andrew Owens (University of Michigan)
Consistent Prompting for Rehearsal-Free Continual Learning
Zhanxin Gao (Sun Yat-sen University) · Jun Cen (None) · Xiaobin Chang (SUN YAT-SEN UNIVERSITY)
MedBN: Robust Test-Time Adaptation against Malicious Test Samples
Hyejin Park (Pohang University of Science and Technology (POSTECH)) · Jeongyeon Hwang (Pohang University of Science and Technology) · Sunung Mun (Pohang University of Science and Technology) · Sangdon Park (POSTECH) · Jungseul Ok (POSTECH)
Open-Vocabulary Video Anomaly Detection
Peng Wu (Northwest Polytechnical University Xi'an) · Xuerong Zhou (Northwest Polytechnical University Xi'an) · Guansong Pang (Singapore Management University) · Yujia Sun (Xi'an University of Electronic Science and Technology) · Jing Liu (Guangzhou Institute of Technology, Xidian University) · Peng Wang (Northwestern Polytechnical University) · Yanning Zhang (Northwestern Polytechnical University)
Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion
Litu Rout (University of Texas at Austin) · Yujia Chen (Google) · Abhishek Kumar (Google DeepMind) · Constantine Caramanis (University of Texas, Austin) · Sanjay Shakkottai (University of Texas, Austin) · Wen-Sheng Chu (Google Research)
Language Model Guided Interpretable Video Action Reasoning
Ning Wang (xidian university) · Guangming Zhu (Xidian University) · Hongsheng Li (Xi'an University of Electronic Science and Technology) · Liang Zhang (Xidian University) · Syed Afaq Ali Shah (Edith Cowan University) · Mohammed Bennamoun (University of Western Australia)
Purified and Unified Steganographic Network
GuoBiao Li (Fudan University) · Sheng Li (Fudan University) · Zicong Luo (Fudan University) · Zhenxing Qian (Fudan University) · Xinpeng Zhang (Fudan University)
Deformable One-shot Face Stylization via DINO Semantic Guidance
Yang Zhou (Shenzhen University) · Zichong Chen (Shenzhen University) · Hui Huang (Shenzhen University)
Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness Sampling
Jianan Li (University of Chinese Academy of Sciences) · Qiulei Dong (Institute of Automation, Chinese Academy of Sciences)
PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
Ardian Umam (National Yang Ming Chiao Tung University) · Cheng-Kun Yang (MediaTek) · Min-Hung Chen (NVIDIA) · Jen-Hui Chuang (None) · Yen-Yu Lin (National Yang Ming Chiao Tung University)
Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures
Huijie Zhang (University of Michigan - Ann Arbor) · Yifu Lu (University of Michigan - Ann Arbor) · Ismail Alkhouri (Michigan State University; University of Michigan) · Saiprasad Ravishankar (Michigan State University) · Dogyoon Song (University of Michigan - Ann Arbor) · Qing Qu (University of Michigan)
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Yang Zhou (SenseTime Research) · Hao Shao (The Chinese University of Hong Kong) · Letian Wang (University of Toronto) · Steven L. Waslander (University of Toronto) · Hongsheng Li (The Chinese University of Hong Kong) · Yu Liu (The Chinese University of Hong Kong)
VecFusion: Vector Font Generation with Diffusion
Vikas Thamizharasan (University of Massachusetts Amherst) · Difan Liu (Adobe Research) · Shantanu Agarwal (Balbix) · Matthew Fisher (Adobe Research) · Michaël Gharbi (Massachusetts Institute of Technology) · Oliver Wang (Adobe Research) · Alec Jacobson (University of Toronto and Adobe Systems) · Evangelos Kalogerakis (UMass Amherst)
Noise-free Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising
Haichao Zhang (Northeastern University) · Yi Xu (Northeastern University) · Hongsheng Lu (Toyota Motor North America) · Takayuki Shimizu (Toyota Motor North America, Inc.) · Yun Fu (Northeastern University)
Learned representation-guided diffusion models for large-image generation
Alexandros Graikos (Stony Brook University) · Srikar Yellapragada (Stony Brook University) · Minh-Quan Le (State University of New York at Stony Brook) · Saarthak Kapse (State University of New York at Stony Brook) · Prateek Prasanna (State University of New York, Stony Brook) · Joel Saltz (State University of New York at Stony Brook) · Dimitris Samaras (Stony Brook University)
Building Optimal Neural Architectures using Interpretable Knowledge
Keith Mills (University of Alberta) · Fred Han (Huawei Technologies Ltd.) · Mohammad Salameh (Huawei Technologies Canada Ltd.) · Shengyao Lu (University of Alberta) · CHUNHUA ZHOU (Huawei Technologies Ltd.) · Jiao He (huawei) · Fengyu Sun (Tongji University) · Di Niu (University of Alberta)
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han (Amazon/AWS) · Shuai Zhang (Amazon) · Xingjian Shi (Boson AI) · Markus Reichstein (Max-Planck Institute)
IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM
Minghao Yin (The University of Hong Kong) · Shangzhe Wu (Stanford University) · Kai Han (The University of Hong Kong)
Bilateral Event Mining and Complementary for Event Stream Super-Resolution
Zhilin Huang (Tsinghua University) · Quanmin Liang (Sun Yat-sen University) · Yijie Yu (Tsinghua University) · Chujun Qin (China Southern Power Grid ) · Xiawu Zheng (Xiamen University) · Kai Huang (SUN YAT-SEN UNIVERSITY,) · Zikun Zhou (Peng Cheng Laboratory) · Wenming Yang (Tsinghua University,)
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
Jun Wang (University of California, San Diego) · Yuzhe Qin (University of California, San Diego, University of California, San Diego) · Kaiming Kuang (University of California, San Diego) · Yigit Korkmaz (University of Southern California) · Akhilan Gurumoorthy (University of California, San Diego) · Hao Su (UCSD) · Xiaolong Wang (UCSD)
Improving Plasticity in Online Continual Learning via Collaborative Learning
Maorong Wang (The University of Tokyo) · Nicolas Michel (None) · Ling Xiao (None) · Toshihiko Yamasaki (None)
Video Harmonization with Triplet Spatio-Temporal Variation Patterns
Zonghui Guo () · XinYu Han (Ocean University of China) · Jie Zhang (Institute of Computing Technology, Chinese Academy of Sciences) · Shiguang Shan (Institute of Computing Technology, Chinese Academy of Sciences) · Haiyong Zheng (Ocean University of China)
Semantic Line Combination Detector
JINWON KO (Korea University, Seoul) · Dongkwon Jin (Korea University) · Chang-Su Kim (Korea University)
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
Chong Bao (Zhejiang University) · Yinda Zhang (Google) · Yuan Li (Zhejiang University) · Xiyu Zhang (Zhejiang University) · Bangbang Yang (ByteDance Inc) · Hujun Bao (Zhejiang University) · Marc Pollefeys (ETH Zurich / Microsoft) · Guofeng Zhang (Zhejiang University) · Zhaopeng Cui (None)
Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens
Zhiwen Chen (Xidian University) · Zhiyu Zhu (City University of Hong Kong) · Yifan Zhang (City University of Hong Kong) · Junhui Hou (City University of Hong Kong) · Guangming Shi (Xidian University) · Jinjian Wu (Xidian University)
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
Lingteng Qiu (None) · Guanying Chen (The Chinese University of Hong Kong, Shenzhen) · Xiaodong Gu (Alibaba Group) · Qi Zuo (Alibaba Group) · Mutian Xu (None) · Yushuang Wu (The Chinese University of Hong Kong (Shenzhen)) · Weihao Yuan (Alibaba Group) · Zilong Dong (Alibaba Group) · Liefeng Bo (None) · Xiaoguang Han (The Chinese University of Hong Kong, Shenzhen)
Multi-Modal Hallucination Control by Visual Information Grounding
Alessandro Favero (EPFL - EPF Lausanne) · Luca Zancato (AWS AI Labs) · Matthew Trager (Amazon) · Siddharth Choudhary (Amazon AGI) · Pramuditha Perera (Amazon) · Alessandro Achille (California Institute of Technology) · Ashwin Swaminathan (University of Maryland, College Park) · Stefano Soatto (AWS)
Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis
Atefeh Khoshkhahtinat (None) · Ali Zafari (West Virginia University) · Piyush Mehta (West Virginia University) · Nasser Nasrabadi (West Virginia University)
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
Xudong Wang (Electrical Engineering & Computer Science Department, University of California Berkeley) · Ishan Misra (Facebook) · Ziyun Zeng (UCB) · Rohit Girdhar (Meta) · Trevor Darrell (Electrical Engineering & Computer Science Department)
Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection
Chengjie Wang (Tencent Youtu Lab; Shanghai Jiao Tong University) · wenbing zhu (Fudan University) · Bin-Bin Gao (None) · Zhenye Gan (Tencent Youtu Lab) · Jiangning Zhang (Tencent Youtu Lab) · Zhihao Gu (Shanghai Jiao Tong University) · Bruce Qian (None) · Mingang Chen (Shanghai Development Center of Computer Software Technology) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang (University of Adelaide) · Bohan Zhuang (Monash University) · Qi Wu (University of Adelaide)
3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow
Felix Taubner (LG Electronics) · Prashant Raina (LG Electronics) · Mathieu Tuli (LG Electronics Canada Incorporated, TAIL) · Eu Wern Teh (LG Corporation) · Chul Lee (LG Electronics) · Jinmiao Huang (Meta)
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
Peng Dai (Bytedance Inc.) · Yang Zhang (None) · Tao Liu (ByteDance Inc.) · ZhenFan (Bytedance) · Tianyuan Du (Bytedance) · Zhuo Su (ByteDance) · Xiaozheng Zheng (ByteDance) · Zeming Li (BYTEDANCE)
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu (Chinese University of Hong Kong (Shenzhen)) · Yingwei Pan (HiDream.ai) · Yehao Li (HiDream.ai) · Ting Yao (JD AI Research) · Zhenglong Sun (The Chinese University of Hong Kong, Shenzhen) · Tao Mei (JD Explore Academy) · Chang-Wen Chen (The Hong Kong Polytechnic University)
Traffic Scene Parsing through the TSP6K Dataset
Peng-Tao Jiang (vivo Mobile Communication (Hangzhou) Co., Ltd.) · Yuqi Yang (Nankai University) · Yang Cao (Hong Kong University of Science and Technology) · Qibin Hou (Nankai University) · Ming-Ming Cheng (Nankai University, Tsinghua University) · Chunhua Shen (Zhejiang University)
Garment Recovery with Shape and Deformation Priors
Ren Li (EPFL) · Corentin Dumery (EPFL) · Benoît Guillard (Swiss Federal Institute of Technology Lausanne) · Pascal Fua (Swiss Federal Institute of Technology Lausanne)
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang (Hefei University of Technology) · Dan Guo (Hefei University of Technology) · Kun Li (Hefei University of Technology) · Zhun Zhong (University of Nottingham) · Meng Wang (Hefei University of Technology)
KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
Hugues Thomas (Apple Inc.) · Yao-Hung Hubert Tsai (Apple) · Timothy Barfoot (University of Toronto) · Jian Zhang (Apple)
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Yuqing Wen (Megvii Technology Inc.) · Yucheng Zhao (University of Science and Technology of China) · Yingfei Liu (Megvii Technology Inc.) · Fan Jia (Megvii Technology Inc.) · Yanhui Wang (None) · Chong Luo (Microsoft Research Asia) · Chi Zhang (Columbia University) · Tiancai Wang (Megvii Technology Inc.) · Xiaoyan Sun (University of Science and Technology of China) · Xiangyu Zhang (MEGVII Technology)
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Muyang Li (None) · Tianle Cai (Princeton University) · Jiaxin Cao (Lepton AI) · Qinsheng Zhang (Georgia Institute of Technology) · Han Cai (Massachusetts Institute of Technology) · Junjie Bai (Lepton AI Inc.) · Yangqing Jia (Lepton AI) · Kai Li (Princeton University) · Song Han (Massachusetts Institute of Technology)
What Moves Together Belongs Together
Jenny Seidenschwarz (Department of Informatics, Technische Universität München) · Aljoša Ošep (Carnegie Mellon University) · Francesco Ferroni () · Simon Lucey (University of Adelaide) · Laura Leal-Taixe (NVIDIA)
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Sadeep Jayasumana (Google) · Srikumar Ramalingam (Google) · Andreas Veit (Google) · Daniel Glasner (Google) · Ayan Chakrabarti (Google) · Sanjiv Kumar (Google)
NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
Yufei Han (None) · Heng Guo (Beijing University of Posts and Telecommunications) · Koki Fukai (Osaka University) · Hiroaki Santo (Osaka University) · Boxin Shi (Peking University) · Fumio Okura (Osaka University) · Zhanyu Ma (Beijing University of Post and Telecommunication) · Yunpeng Jia (Beijing University of Posts and Telecommunications)
Text-Driven Image Editing via Learnable Regions
Yuanze Lin (University of Oxford) · Yi-Wen Chen (University of California, Merced) · Yi-Hsuan Tsai (Google) · Lu Jiang (Carnegie Mellon University) · Ming-Hsuan Yang (University of California at Merced)
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
Feng Lu (Tsinghua University) · Xiangyuan Lan (Peng Cheng Laboratory) · Lijun Zhang (University of Chinese Academy of Sciences) · Dongmei Jiang (Peng Cheng Laboratory) · Yaowei Wang (Pengcheng Laboratory) · Chun Yuan (Tsinghua University, Tsinghua University)
pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
David Charatan (Massachusetts Institute of Technology) · Sizhe Lester Li (Massachusetts Institute of Technology) · Andrea Tagliasacchi (Simon Fraser University, Google Brain) · Vincent Sitzmann (Massachusetts Institute of Technology)
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An (University of Copenhagen) · Guolei Sun (None) · Yun Liu (Institute for Infocomm Research, A*STAR) · Fayao Liu (Institute for Infocomm Research, A*STAR) · Zongwei Wu (Bayerische Julius-Maximilians-Universität Würzburg) · Dan Wang (University of Copenhagen) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Serge Belongie (University of Copenhagen)
Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
Bingfeng Zhang (China University of Petroleum (East China)) · Siyue Yu (Xi'an Jiaotong-Liverpool University) · Yunchao Wei (Beijing Jiaotong University) · Yao Zhao (Beijing Jiaotong University) · Jimin Xiao (Xi'an Jiaotong-Liverpool University)
LEDITS++: Limitless Image Editing using Text-to-Image Models
Manuel Brack (Technische Universität Darmstadt) · Felix Friedrich (TU Darmstadt, Hessian.AI) · Katharina Kornmeier (Align Technology) · Linoy Tsaban (Hugging Face) · Patrick Schramowski (TU Darmstadt) · Kristian Kersting (TU Darmstadt) · Apolinário Passos (Universidade de Brasília)
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu (Peking University) · Fangyun Wei (None) · Yanye Lu (Peking University)
Revamping Federated Learning Security from a Defender's Perspective: A Unified Defense with Homomorphic Encrypted Data Space
Naveen Kumar Kummari (Indian Institute of Technology Hyderabad, India) · Reshmi Mitra (Southeast Missouri State University) · Krishna Mohan Chalavadi (Indian Institute of Technology Hyderabad)
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor
Jaewon Jung (Seoul National University) · Hongsun Jang (Seoul National University) · Jaeyong Song (Seoul National University) · Jinho Lee (Seoul National University)
Adversarial Distillation Based on Slack Matching and Attribution Region Alignment
Shenglin Yin (Peking University) · Zhen Xiao (Peking University) · Mingxuan Song (Peking University) · Jieyi Long (Theta Labs, Inc.)
Universal Robustness via Median Random Smoothing for Real-World Super-Resolution
Zakariya Chaouai (Paris-Saclay University, CEA, List) · Mohamed Tamaazousti (CEA/LIST)
RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection
Zhiwei Lin (Peking University) · Zhe Liu (University of Electronic Science and Technology of China) · Zhongyu Xia (Peking University) · Xinhao Wang (Peking University) · Yongtao Wang (Peking University) · Shengxiang Qi (Chongqing Changan Automobile Co., Ltd) · Yang Dong (Chongqing Changan Automobile Co., Ltd.) · Nan Dong (changan) · Le Zhang (University of Electronic Science and Technology of China) · Ce Zhu (University of Electronic Science and Technology of China)
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang (Nanyang Technological University) · Yinan He (Shanghai AI Laboratory) · Jiashuo Yu (Shanghai AI Laboratory) · Fan Zhang (None) · Chenyang Si (Nanyang Technological University Singapore) · Yuming Jiang (Nanyang Technological University) · Yuanhan Zhang (Nanyang Technological University) · Tianxing Wu (Nanyang Technological University) · Jin Qingyang (Nanyang Technological University) · Nattapol Chanpaisit (Nanyang Technological University) · Yaohui Wang (Shanghai AI Laboratory) · Xinyuan Chen (Shanghai Artificial Intelligence Laboratory) · Limin Wang (Nanjing University) · Dahua Lin (The Chinese University of Hong Kong) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Ziwei Liu (Nanyang Technological University)
The Neglected Tails in Vision-Language Models
Shubham Parashar (Texas A&M University - College Station) · Tian Liu (Texas A&M University - College Station) · Zhiqiu Lin (Carnegie Mellon University) · Xiangjue Dong (Texas A&M University - College Station) · Yanan Li (Zhejiang Lab) · James Caverlee (Texas A&M University) · Deva Ramanan (Carnegie Mellon University) · Shu Kong (University of Macau, Texas A&M University)
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Xianpeng Liu (North Carolina State University) · Ce Zheng (University of Central Florida) · Ming Qian (None) · Nan Xue (Ant Group) · Chen Chen () · Zhebin Zhang (OPPO) · Chen Li (Innopeak Technology Inc.) · Tianfu Wu ()
SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder
Dihan Zheng (Tsinghua University) · Yihang Zou (Tsinghua University) · Xiaowen Zhang (Hisilicon) · Chenglong Bao (Tsinghua University, Tsinghua University)
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
Zechuan Zhang (Zhejiang University) · Zongxin Yang (Zhejiang University) · Yi Yang (Zhejiang University)
Generalized Predictive Model for Autonomous Driving
Jiazhi Yang (Shanghai AI Laboratory) · Shenyuan Gao (HKUST) · Yihang Qiu (Shanghai Jiao Tong University) · Li Chen (The University of Hong Kong) · Tianyu Li (Fudan University) · Bo Dai (Shanghai AI Laboratory) · Kashyap Chitta () · Penghao Wu (University of California, San Diego) · Jia Zeng (Shanghai Jiaotong University) · Ping Luo (The University of Hong Kong) · Jun Zhang (The Hong Kong University of Science and Technology) · Andreas Geiger (University of Tübingen) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Hongyang Li (Shanghai AI Lab)
Scaling Up Dynamic 3D Human-Scene Interaction Modelling
Nan Jiang (Peking University) · Zhiyuan Zhang (Department of Automation, Tsinghua University) · Hongjie Li (Peking University) · Xiaoxuan Ma (Peking University) · Zan Wang (Beijing Institute of Technology) · Yixin Chen (BIGAI) · Tengyu Liu (None) · Yixin Zhu (Peking University) · Siyuan Huang (Beijing Institute of General Artificial Intelligence)
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Chenyu You (Yale University) · Yifei Min (Yale University) · Weicheng Dai (Yale University) · Jasjeet Sekhon (Yale University) · Lawrence Staib (Yale University) · James Duncan (Yale University)
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
Zikai Xiao (None) · Guo-Ye Yang (None) · Xue Yang (Shanghai AI Laboratory) · Tai-Jiang Mu (Tsinghua University, Tsinghua University) · Junchi Yan (Shanghai Jiao Tong University) · Shi-Min Hu (Tsinghua University, Tsinghua University)
Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Song Tang (University of Shanghai for Science and Technology) · Wenxin Su (University of Shanghai for Science and Technology) · Mao Ye (University of Electronic Science and Technology of China) · Xiatian Zhu (University of Surrey)
Grounding and Enhancing Grid-based Models for Neural Fields
Zelin Zhao (Shanghai Jiao Tong University) · FENGLEI FAN (The Chinese University of Hong Kong) · Wenlong Liao (Shanghai Jiaotong University) · Junchi Yan (Shanghai Jiao Tong University)
Data-Free Quantization via Pseudo-label Filtering
Chunxiao Fan (Hefei University of Technology) · Ziqi Wang (Hefei University of Technology) · Dan Guo (Hefei University of Technology) · Meng Wang (Hefei University of Technology)
Neural Sign Actors: A diffusion model for 3D sign language production from text
Vasileios Baltatzis (None) · Rolandos Alexandros Potamias (Imperial College London) · Evangelos Ververas (Huawei Technologies Ltd.) · Guanxiong Sun (Huawei Technologies Ltd.) · Jiankang Deng (Imperial College London & Huawei UKRD) · Stefanos Zafeiriou (Imperial College London)
Seeing Motion at Nighttime with an Event Camera
Haoyue Liu (Huazhong University of Science and Technology) · Shihan Peng (Huazhong University of Science and Technology) · Lin Zhu (Beijing Institute of Technology) · Yi Chang (Huazhong University of Science and Technology) · Hanyu Zhou (Huazhong University of Science and Technology) · Luxin Yan (Huazhong University of Science and Technology)
UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
David Rozenberszki (None) · Or Litany (NVIDIA / Technion) · Angela Dai ()
HumMUSS: Human Motion Understanding using State Space Models
Arnab Mondal (McGill University) · Stefano Alletto (Apple) · Denis Tome (Apple)
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Sangmin Lee (University of Illinois Urbana-Champaign) · Bolin Lai (Georgia Institute of Technology) · Fiona Ryan (Georgia Institute of Technology) · Bikram Boote (University of Illinois, Urbana Champaign) · James Rehg (None)
APISR: Anime Production Inspired Real-World Anime Super-Resolution
Boyang Wang (University of Michigan - Ann Arbor) · Fengyu Yang (Yale University) · Xihang Yu (University of Michigan - Ann Arbor) · Chao Zhang (Zhejiang University) · Hanbin Zhao (Zhejiang University)
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
SICHENG MO (University of California, Los Angeles) · Fangzhou Mu (University of Wisconsin-Madison) · Kuan Heng Lin (University of California, Los Angeles) · Yanli Liu (Shein Technology LLC) · Bochen Guan (OPPO US Research Center) · Yin Li (University of Wisconsin, Madison) · Bolei Zhou (University of California, Los Angeles)
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
Rashindrie Perera (University of Melbourne) · Saman Halgamuge (University of Melbourne)
ShapeWalk: Compositional Shape Editing through Language-Guided Chains
Habib Slim (KAUST) · Mohamed Elhoseiny (KAUST)
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
Pancheng Zhao (Nankai University) · Peng Xu (Tsinghua University, Tsinghua University) · Pengda Qin (Alibaba Group) · Deng-Ping Fan (ETH Zurich) · Zhicheng Zhang (Nankai University) · Guoli Jia (None) · Bowen Zhou (Tsinghua University) · Jufeng Yang (None)
WaveFace: Authentic Face Restoration with Efficient Frequency Recovery
Yunqi Miao (The university of Warwick) · Jiankang Deng (Imperial College London & Huawei UKRD) · Jungong Han (Aberystwyth University)
Hierarchical Histogram Threshold Segmentation – Auto-terminating High-detail Oversegmentation
Thomas Chang (Nuremberg Institute of Technology) · Simon Seibt (Georg-Simon-Ohm-Fachhochschule Nürnberg) · Bartosz von Rymon Lipinski (Technical University oAS Nuremberg)
G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grouping
Junfeng Cheng (Imperial College London) · Tania Stathaki (Imperial College London)
Dual-Enhanced Coreset Selection with Class-wise Collaboration for Online Blurry Class Incremental Learning
Yutian Luo (Renmin University of China) · Shiqi Zhao (China Unicom Research Institute) · Haoran Wu (China Unicom Research Institute ) · Zhiwu Lu (Renmin University of China)
Generative Multi-modal Models are Good Class Incremental Learners
Xusheng Cao (Nankai University) · Haori Lu (Nankai University) · Linlan Huang (Nankai University) · Xialei Liu (Nankai University) · Ming-Ming Cheng (Nankai University, Tsinghua University)
TULIP: Multi-camera 3D Precision Assessment of Parkinson's Disease
Kyungdo Kim (Duke University) · Sihan Lyu (Duke University) · Sneha Mantri (Duke University) · Timothy DUNN (Duke University)
WaveMo: Learning Wavefront Modulations to See Through Scattering
Mingyang Xie (University of Maryland, College Park) · Haiyun Guo (Rice University) · Brandon Y. Feng (Massachusetts Institute of Technology) · Lingbo Jin (Rice University) · Ashok Veeraraghavan (William Marsh Rice University) · Christopher Metzler (University of Maryland, College Park)
BANF: Band-limited Neural Fields for Levels of Detail Reconstruction
Ahan Shabanov (Simon Fraser University) · Shrisudhan Govindarajan (Simon Fraser University) · Cody Reading (Simon Fraser University) · Leili Goli (University of Toronto) · Daniel Rebain (None) · Kwang Moo Yi (University Of British Columbia) · Andrea Tagliasacchi (Simon Fraser University, Google Brain)
Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle
Hyeokjun Kweon (KAIST) · Jihun Kim (KAIST) · Kuk-Jin Yoon (KAIST)
StraightPCF: Straight Point Cloud Filtering
Dasith de Silva Edirimuni (Deakin University) · Xuequan Lu (La Trobe University) · Gang Li (Deakin University) · Lei Wei (Deakin University) · Antonio Robles-Kelly (Defence Science and Technology Group (DST), Deakin University) · Hongdong Li (Australian National University)
SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving
Yiming Xie (Shenzhen International Graduate School, Tsinghua University) · Henglu Wei (Tsinghua University, Tsinghua University) · Zhenyi Liu (Stanford University) · Xiaoyu Wang (Department of Automation, Tsinghua University) · Xiangyang Ji (Tsinghua University)
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim (Beeble Inc.) · Minje Jang (Beeble Inc.) · Wonjun Yoon (Beeble Inc.) · Jisoo Lee (Beeble Inc.) · Donghyun Na (Beeble Inc.) · Sanghyun Woo (New York University)
An Empirical Study of Scaling Law for Scene Text Recognition
Miao Rang (Huawei Noah's Ark Lab) · Zhenni Bi (Huawei Noah Ark Lab) · Chuanjian Liu (Huawei Technologies Ltd.) · Yunhe Wang (Huawei Noah's Ark Lab) · Kai Han (Huawei Noah's Ark Lab)
PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition
Haosong Zhang (School of Computer Science and Engineering, Nanyang Technological University) · Mei Leong (, A*STAR) · Liyuan Li (I2R, A*STAR) · Weisi Lin (Nanyang Technological University)
Sparse Semi-Detr: Sparse Learnable Queries for Semi-Supervised Object Detection
Tahira Shehzadi () · Khurram Azeem Hashmi (DFKI - German Research Center for AI) · Didier Stricker (Universität Kaiserslautern) · Muhammad Zeshan Afzal (German Research Center for AI)
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou (Computer Science, Tsinghua University, Tsinghua University) · Chao Yang (Shanghai AI Laboratory) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Chengbin Quan (Tsinghua University, Tsinghua University) · Youjian Zhao (Tsinghua University)
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng (None) · Zhicheng Guo (Tsinghua University, Tsinghua University) · Jingwen Wu (University of Toronto) · Kechen Fang (Tsinghua University) · Peng Li (Tsinghua University) · Huaping Liu (Tsinghua University, Tsinghua University) · Yang Liu (Tsinghua University)
When StyleGAN Meets Stable Diffusion: a ${\mathcal{W}_+}$ Adapter for Personalized Image Generation
Xiaoming Li (MMLab@NTU) · Xinyu Hou (Nanyang Technological University) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY)
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
Jiazhou Zhou (Hong Kong University of Science and Technology) · Xu Zheng (HKUST) · Yuanhuiyi Lyu (Hong Kong University of Science and Technology (Guangzhou)) · Lin Wang (Hong Kong University of Science and Technology)
UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
Shuai Yuan (Duke University, Meta, TikTok) · Lei Luo (Meta) · Zhuo Hui (Facebook) · Can Pu (Facebook) · Xiaoyu Xiang (Meta) · Rakesh Ranjan () · Denis Demandolx (Meta)
Low-power, Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann (TU Berlin) · Suman Ghosh (TU Berlin) · Ignacio Juarez Martinez (University of Oxford) · Tom Hart (Oxford Brookes University) · Alex Kacelnik (University of Oxford) · Guillermo Gallego (TU Berlin-ECDF-SCIoI)
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
Taeho Kang (Seoul National University) · Youngki Lee (Seoul National University)
GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
Yuan Dong (Alibaba Group) · Qi Zuo (Alibaba Group) · Xiaodong Gu (Alibaba Group) · Weihao Yuan (Alibaba Group) · zhengyi zhao (Alibaba Group) · Zilong Dong (Alibaba Group) · Liefeng Bo (None) · Qixing Huang (University of Texas at Austin)
Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos
Yuhan Shen (Northeastern University) · Ehsan Elhamifar (None)
Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion
Sofia Casarin (Free University of Bozen-Bolzano) · Cynthia Ugwu (Free University of Bozen) · Sergio Escalera (Computer Vision Center) · Oswald Lanz (Free University of Bozen-Bolzano)
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Yannan He (University of Tübingen) · Garvita Tiwari (University of Tuebingen and MPI-Saarbrucken) · Tolga Birdal (Imperial College London) · Jan Lenssen (Saarland Informatics Campus, Max-Planck Institute) · Gerard Pons-Moll (University of Tübingen)
ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion
Jiayu Yang (Australian National University) · Ziang Cheng (Australian National University) · Yunfei Duan (Tencent Game) · Pan Ji (Tencent XR Vision Labs) · Hongdong Li (Australian National University)
Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks
Shin'ya Yamaguchi (Kyoto University) · Sekitoshi Kanai (NTT) · Kazuki Adachi (NTT) · Daiki Chijiwa (NTT, The University of Tokyo)
LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning
Siyuan Cheng (Purdue University) · Guanhong Tao (Purdue University) · Yingqi Liu (Microsoft) · Guangyu Shen (Purdue University) · Shengwei An (Purdue University) · Shiwei Feng (Purdue University, West Lafayette) · Xiangzhe Xu (Purdue University) · Kaiyuan Zhang (Computer Science, Purdue University) · Shiqing Ma (University of Massachusetts at Amherst) · Xiangyu Zhang (, Purdue University)
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
Haoxuanye Ji (Xi'an Jiaotong University) · Pengpeng Liang (Zhengzhou University) · Erkang Cheng (Nullmax)
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Shiming Chen (Carnegie Mellon University) · Wenjin Hou (Huazhong University of Science and Technology) · Salman Khan (Mohamed bin Zayed University of Artificial Intelligence) · Fahad Shahbaz Khan (MBZUAI; Linköping University)
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
Zining Chen (Beijing University of Posts and Telecommunications) · Weiqiu Wang (Beijing University of Posts and Telecommunications) · Zhicheng Zhao (Beijing University of Posts and Telecommunications) · Fei Su (Beijing University of Posts and Telecommunications) · Aidong Men (Beijing University of Posts and Telecommunications) · Hongying Meng (None)
Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning
Rui Zhao (Xi'an Jiaotong University) · Bin Shi (Xi'an Jiaotong University) · Jianfei Ruan (Xi'an Jiaotong University) · Tianze Pan (Xi'an Jiaotong University) · Bo Dong (Xi'an Jiaotong University)
ManiFPT: Defining and Analyzing Fingerprints of Generative Models
Hae Jin Song (University of Southern California) · Mahyar Khayatkhoei (USC/ISI) · Wael AbdAlmageed (Clemson University)
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
Trung Dao (VinAI) · Duc H Vu (VinAI Research) · Cuong Pham (Posts & Telecommunications Institute of Technology and VinAI Research) · Anh Tran (VinAI Research)
MaxQ: Multi-Axis Query for N:M Sparsity Network
Jingyang Xiang (Zhejiang University) · Siqi Li (Zhejiang University) · Junhao Chen (Zhejiang University) · Zhuangzhi Chen (Zhejiang University of Technology) · Tianxin Huang (Tencent youtu lab) · Linpeng Peng (Zhejiang University) · Yong Liu (Zhejiang University)
Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
Pengze Zhang (Sun Yat-sen University) · Hubery Yin (Tencent) · Chen Li (Tencent) · Xiaohua Xie (SUN YAT-SEN UNIVERSITY)
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Haoning Wu (Nanyang Technological University) · Zicheng Zhang (Shanghai Jiaotong University) · Erli Zhang (Nanyang Technological University) · Chaofeng Chen (Nanyang Technological University) · Liang Liao (Nanyang Technological University) · Annan Wang (Nanyang Technological University) · Kaixin Xu (I2R, A*STAR) · Chunyi Li (None) · Jingwen Hou (Nanyang Technological University) · Guangtao Zhai (Shanghai Jiao Tong University) · Xue Geng (Institute for Infocomm Research, A*STAR) · Wenxiu Sun (SenseTime Research and Tetras.AI) · Qiong Yan (SenseTime Research) · Weisi Lin (Nanyang Technological University)
Enhancing the Power of OOD Detection via Sample-Aware Model Selection
Feng Xue (Shanghai Jiaotong University) · Zi He (HuNan University) · Yuan Zhang (Beijing Normal University) · Chuanlong Xie (Beijing Normal University) · Zhenguo Li (Huawei) · Falong Tan (Hunan University)
REACTO: Reconstructing Articulated Objects from a Single Video
Chaoyue Song (Nanyang Technological University) · Jiacheng Wei (Nanyang Technological University) · Chuan-Sheng Foo (Centre for Frontier AI Research, A*STAR) · Guosheng Lin (Nanyang Technological University) · Fayao Liu (Institute for Infocomm Research, A*STAR)
Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement
Daiwei Yu (Hangzhou City University) · Zhuorong Li (HangZhou City University) · Lina Wei (Hangzhou City University ) · Canghong Jin (Hangzhou City University) · Yun Zhang (Hangzhou City University) · Sixian Chan (the College of Computer Science and Technology at Zhejiang University of Technology)
Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training
Shizhan Gong (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Qi Dou (The Chinese University of Hong Kong) · Farzan Farnia (The Chinese University of Hong Kong)
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong (State University of New York at Stony Brook) · Bin Chen (Bytedance Inc.) · Xiulong Liu (University of Washington) · Pawel Polak (State University of New York at Stony Brook) · Peng Zhang (Bytedance)
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Daichi Horita (The University of Tokyo) · Naoto Inoue (CyberAgent) · Kotaro Kikuchi (None) · Kota Yamaguchi (CyberAgent) · Kiyoharu Aizawa (The University of Tokyo)
Object Recognition as Next Token Prediction
Kaiyu Yue (University of Maryland, College Park) · Bor-Chun Chen (Facebook) · Jonas Geiping (University of Maryland, College Park) · Hengduo Li (Meta AI) · Tom Goldstein (University of Maryland, College Park) · Ser-Nam Lim (Meta AI)
A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
Siddharth Srivastava (TensorTour Inc) · Gaurav Sharma (TensorTour Inc.)
Transfer CLIP for Generalizable Image Denoising
Jun Cheng (Huazhong University of Science and Technology) · Dong Liang (Huazhong University of Science and Technology) · Shan Tan (Huazhong University of Science and Technology)
LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
Jing Zhang (New York University) · Irving Fang (New York University) · Hao Wu (New York University) · Akshat Kaushik (New York University) · Alice Rodriguez (New York University) · Hanwen Zhao (New York University) · Juexiao Zhang (New York University) · Zhuo Zheng (Stanford University) · Radu Iovita (New York University) · Chen Feng (New York University)
Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration
Mingyuan Meng (The University of Sydney) · Dagan Feng (University of Sydney) · Lei Bi (the University of Sydney) · Jinman Kim (University of Sydney)
Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling
Jiawei Shi (Northwest Polytechnical University Xi'an) · Hui Deng (Northwest Polytechnical University Xi'an) · Yuchao Dai (Northwestern Polytechnical University)
Multiway Point Cloud Mosaicking with Diffusion and Global Optimization
Shengze Jin (Department of Computer Science, ETHZ - ETH Zurich) · Iro Armeni (Stanford University) · Marc Pollefeys (ETH Zurich / Microsoft) · Daniel Barath (ETHZ - ETH Zurich)
IDGuard: Robust, General, Identity-centric POI Proactive Defense Against Face Editing Abuse
Yunshu Dai (SUN YAT-SEN UNIVERSITY) · Jianwei Fei (Nanjing University of Information Science and Technology) · Fangjun Huang (SUN YAT-SEN UNIVERSITY)
PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren (Beijing Jiaotong University) · Zhicheng Huang (University of Science and Technology Beijing) · Yunchao Wei (Beijing Jiaotong University) · Yao Zhao (Beijing Jiaotong University) · Dongmei Fu (University of Science and Technology Beijing) · Jiashi Feng (ByteDance) · Xiaojie Jin (ByteDance Inc./TikTok)
Cross Initialization for Face Personalization of Text-to-Image Models
Lianyu Pang (None) · Jian Yin () · Haoran Xie (Lingnan University) · Qiping Wang (East China Normal University) · Qing Li (The Hong Kong Polytechnic University, Hong Kong Polytechnic University) · Xudong Mao (None)
Single-Model and Any-Modality for Video Object Tracking
Zongwei Wu (Bayerische Julius-Maximilians-Universität Würzburg) · Jilai Zheng (Shanghai Jiaotong University) · Xiangxuan Ren (Shanghai Jiao Tong University) · Florin-Alexandru Vasluianu (Bayerische Julius-Maximilians-Universität Würzburg) · Chao Ma (Shanghai Jiao Tong University) · Danda Paudel (INSAIT, Sofia University) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Radu Timofte (University of Würzburg)
iKUN: Speak to Trackers without Retraining
Yunhao Du (Beijing University of Posts and Telecommunications) · Cheng Lei (Beijing University of Posts and Telecommunications) · Zhicheng Zhao (Beijing University of Posts and Telecommunications) · Fei Su (Beijing University of Posts and Telecommunications)
Neural Fields as Distributions: Signal Processing Beyond Euclidean Space
Daniel Rebain (None) · Soroosh Yazdani (Google) · Kwang Moo Yi (University Of British Columbia) · Andrea Tagliasacchi (Simon Fraser University, Google Brain)
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi (The Hong Kong University of Science and Technology) · Jiahao Pan (Hong Kong University of Science and Technology) · Peng Li (Tsinghua University) · Ruibin Yuan (Hong Kong University of Science and Technology) · Xiaowei Chi (Hong Kong University of Science and Technology) · Mengfei Li (Hong Kong University of Science and Technology) · Wenhan Luo (SUN YAT-SEN UNIVERSITY) · Wei Xue (Hong Kong University of Science and Technology) · Shanghang Zhang (Peking University) · Qifeng Liu (The Hong Kong University of Science and Technology) · Yike Guo (Imperial College London)
OmniMotionGPT: Animal Motion Generation with Limited Data
Zhangsihao Yang (None) · Mingyuan Zhou (Innopeak Technology) · Mengyi Shan (University of Washington) · Bingbing Wen (University of Washington) · Ziwei Xuan (Innopeak Technology) · Mitch Hill (None) · Junjie Bai (CuraCloud Corporation) · Guo-Jun Qi (University of Central Florida) · Yalin Wang (Arizona State University)
Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
Wenhao Tang (Chongqing University) · Fengtao ZHOU (Department of Computer Science and Engineering, Hong Kong University of Science and Technology) · Sheng Huang (Chongqing University) · Xiang Zhu (Chongqing University) · Yi Zhang (Chongqing University) · Bo Liu (Rutgers University)
Stratified Avatar Generation from Sparse Observations
Han Feng (Wuhan University) · Wenchao Ma (Pennsylvania State University) · Quankai Gao (University of Southern California) · Xianwei Zheng (Wuhan University) · Nan Xue (Ant Group) · Huijuan Xu (Pennsylvania State University--University Park)
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
Gangwei Xu (Huazhong University of Science and Technology) · Yujin Wang (Shanghai Artificial Intelligence Laboratory) · Jinwei Gu (The Chinese University of Hong Kong) · Tianfan Xue (The Chinese University of Hong Kong) · Xin Yang (Huazhong University of Science and Technology)
SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction
Conghao Wong (Huazhong University of Science and Technology) · Beihao Xia (Huazhong University of Science and Technology) · Ziqian Zou (Huazhong University of Science and Technology) · Yulong Wang (Huazhong Agricultural University) · Xinge You (Huazhong University of Science and Technology)
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang (Institute of Computing Technology, Chinese Academy of Sciences) · Xiangyang Li (Institue of Computing Technology, Chinese Academy of Sciences) · Jiahao Yang (Institute of Computing Technology, Chinese Academy of Sciences) · Yeqi Liu (Institute of Computing Technology, Chinese Academy of Sciences) · Junjie Hu (University of Wisconsin, Madison) · Ming Jiang (Indiana University) · Shuqiang Jiang (Institute of Computing Technology, Chinese Academy of Sciences)
Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations
Kewei Wang (Huazhong University of Science and Technology) · Yizheng Wu (Nanyang Technological University) · Jun Cen (None) · Zhiyu Pan (None) · Xingyi Li (Huazhong University of Science and Technology) · Zhe Wang (Sensetime Group Limited) · Zhiguo Cao () · Guosheng Lin (Nanyang Technological University)
Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil (ML Labs, Dublin City University) · Raiymbek Akshulakov (University of California, Berkeley) · YASSER ABDELAZIZ DAHOU DJILALI (Technology Innovation Institute) · Mohamed El Amine Seddik (Technology Innovation Institute) · Sanath Narayan (Technology Innovation Institute) · Karttikeya Mangalam (University of California Berkeley) · Noel O'Connor (Dublin City University)
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei (Nanjing University of Science and Technology) · Tao Chen (None) · Xiruo Jiang (None) · 刘华峰 Liu (Nanjing University of Science and Technology) · Zeren Sun (Nanjing University of Science and Technology) · Yazhou Yao (Nanjing University of Science and Technology)
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
Zhongwei Zhang (University of Science and Technology of China) · Fuchen Long (JD.com) · Yingwei Pan (HiDream.ai) · Zhaofan Qiu (University of Science and Technology of China) · Ting Yao (JD AI Research) · Yang Cao (University of Science and Technology of China) · Tao Mei (JD Explore Academy)
Pose-Transformed Equivariant Network for 3D Point Trajectory Prediction
Ruixuan Yu (Shandong University) · Jian Sun (Xi'an Jiaotong University)
Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform
Chunghyun Park (POSTECH) · Seungwook Kim (POSTECH) · Jaesik Park (Seoul National University) · Minsu Cho (POSTECH)
FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning
Rishub Tamirisa (Lapis Labs) · Chulin Xie (University of Illinois, Urbana Champaign) · Wenxuan Bao (University of Illinois Urbana Champaign) · Andy Zhou (Lapis Labs) · Ron Arel (Lapis Lapis, UIUC) · Aviv Shamsian (Bar-Ilan University)
Test-Time Adaptation for Depth Completion
Hyoungseob Park (Yale University) · Anjali W Gupta (Yale) · Alex Wong (Yale University)
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li (SIAT, UCAS) · Yali Wang (SIAT, Chinese Academy of Sciences) · Yinan He (Shanghai AI Laboratory) · Yizhuo Li (The University of Hong Kong) · Yi Wang (Shanghai AI Laboratory) · Yi Liu (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Zun Wang (Australian National University) · Jilan Xu (None) · Guo Chen (Nanjing University) · Ping Luo (The University of Hong Kong) · Limin Wang (Nanjing University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory)
Efficient Hyperparameter Optimization with Adaptive Fidelity Identification
Jiantong Jiang (The University of Western Australia) · Zeyi Wen (Hong Kong University of Science and Technology (Guangzhou)) · Atif Mansoor (University of Western Australia) · Ajmal Mian (University of Western Australia)
MESA: Matching Everything by Segmenting Anything
Yesheng Zhang (Shanghai Jiaotong University) · Xu Zhao (Shanghai Jiao Tong University)
Dispel Darkness for Better Fusion: A Controllable Visual Enhancer based on Cross-modal Conditional Adversarial Learning
Hao Zhang (Wuhan University) · Linfeng Tang (Wuhan University) · Xinyu Xiang (Wuhan University) · Xuhui Zuo (Wuhan University) · Jiayi Ma (Wuhan University)
Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
Shuji Habuchi (Nagoya University) · Keita Takahashi (Nagoya University) · Chihiro Tsutake (Nagoya University) · Toshiaki Fujii (Nagoya University) · Hajime Nagahara (Osaka University)
Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu (The Chinese University of Hong Kong) · Yuechen Zhang (The Chinese University of Hong Kong) · Wenbo Li (Huawei Technologies Ltd.) · Zhe Lin (Adobe Research) · Jiaya Jia (The Chinese University of Hong Kong)
GenN2N: Generative NeRF2NeRF Translation
Xiangyue Liu () · Han Xue (Tsinghua University, Tsinghua University) · Kunming Luo (Hong Kong University of Science and Technology) · Ping Tan (Hong Kong University of Science and Technology) · Li Yi ()
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Yuwen Xiong (University of Toronto) · Zhiqi Li (Nanjing University) · Yuntao Chen (CAIR, HKISI, CAS) · Feng Wang (Tsinghua University, Tsinghua University) · Xizhou Zhu (Shanghai AI Laboratory) · Jiapeng Luo (SenseTime Research) · Wenhai Wang (Shanghai AI Laboratory) · Tong Lu (Nanjing University) · Hongsheng Li (The Chinese University of Hong Kong) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Lewei Lu (SenseTime) · Jie Zhou (None) · Jifeng Dai (Tsinghua University, Tsinghua University)
Dual-scale Transformer for Large-scale Single-Pixel Imaging
Gang Qu (Westlake University) · Ping Wang (Zhejiang University) · Xin Yuan (Westlake University)
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen (Samsung Research America) · Nina Shvetsova (None) · Andrew Rouditchenko (Massachusetts Institute of Technology) · Daniel Kondermann (Quality Match GmbH) · Samuel Thomas (IBM Research) · Shih-Fu Chang (Columbia University) · Rogerio Feris (International Business Machines) · James Glass (Massachusetts Institute of Technology) · Hilde Kuehne (University of Bonn MIT-IBM Watson AI Lab)
2S-UDF: A Novel Two-stage UDF Learning Method for Robust Non-watertight Model Reconstruction from Multi-view Images
Junkai Deng (Institute of Software, Chinese Academy of Sciences) · Fei Hou (Institute of Software, Chinese Academy of Sciences) · Xuhui Chen (Institute of Software, Chinese Academy of Sciences) · Wencheng Wang (Institute of Software, Chinese Academy of Sciences) · Ying He (Nanyang Technological University)
A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution
Zhixiong Yang (National University of Defense Technology) · Jingyuan Xia (National University of Defense Technology) · Shengxi Li (Beihang University) · Xinghua Huang (National University of Defense Technology) · Shuanghui Zhang (National University of Defense Technology) · Zhen Liu (National University of Defense Technology) · Yaowen Fu (National University of Defense Technology) · Yongxiang Liu (National University of Defense Technology)
Continuous Optical Zooming: A Benchmark for Arbitrary-Scale Image Super-Resolution in Real World
Huiyuan Fu (Beijing University of Posts and Telecommunications) · Fei Peng (Beijing University of Posts and Telecommunications) · Xianwei Li (Beijing University of Posts and Telecommunications) · Yejun Li (Beijing University of Posts and Telecommunications) · Xin Wang (State University of New York at Stony Brook) · Huadong Ma (Beijing University of Post and Telecommunication, Tsinghua University)
Parameter Efficient Self-Supervised Geospatial Domain Adaptation
Linus Scheibenreif (University of St.Gallen) · Michael Mommert (Stuttgart University of Applied Sciences) · Damian Borth (University of St.Gallen)
Multimodal Representation Learning by Alternating Unimodal Adaptation
Xiaohui Zhang (Beijing Jiaotong University) · Jaehong Yoon (University of North Carolina at Chapel Hill) · Mohit Bansal (University of North Carolina at Chapel Hill) · Huaxiu Yao (Department of Computer Science, University of North Carolina at Chapel Hill)
Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps
Octave Mariotti (University of Edinburgh) · Oisin Mac Aodha (University of Edinburgh) · Hakan Bilen (University of Edinburgh)
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
Zhengang Li (Northeastern University) · Yan Kang (None) · Yuchen Liu (None) · Difan Liu (Adobe Research) · Tobias Hinz (Adobe Systems) · Feng Liu (Adobe Systems) · Yanzhi Wang (Northeastern University)
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan (Alibaba Group) · Sibo Song (Alibaba Group) · Wenwen Yu (Huazhong University of Science and Technology) · Yuliang Liu (Huazhong University of Science and Technology) · Wenqing Cheng (Huazhong University of Science and Technology) · Fei Huang (Alibaba Group) · Xiang Bai (Huazhong University of Science and Technology) · Cong Yao (Alibaba DAMO Academy) · Zhibo Yang (Alibaba Group)
Compositional Video Understanding with Spatiotemporal Structure-based Transformers
Hoyeoung Yun (Hanyang University) · Jinwoo Ahn (Hanyang University) · Minseo Kim (Hanyang University) · Eun-Sol Kim (Hanyang University)
CoDi-2: Interleaved and In-Context Any-to-Any Generation
Zineng Tang (University of North Carolina, Chapel Hill) · Ziyi Yang (Microsoft) · MAHMOUD KHADEMI (Microsoft) · Yang Liu (Microsoft) · Chenguang Zhu (Zoom) · Mohit Bansal (University of North Carolina at Chapel Hill)
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Jimyeong Kim (Seoul National University) · Jungwon Park (Seoul National University) · Wonjong Rhee (Seoul National University)
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Jingxuan Xu (Beijing Jiaotong University) · Wuyang Chen (University of Texas at Austin) · Yao Zhao (Beijing Jiaotong University) · Yunchao Wei (Beijing Jiaotong University)
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation
Yamei Chen (Technische Universität München) · Yan Di (Technische Universität München) · Guangyao Zhai (Technical University of Munich) · Fabian Manhardt (Google) · Chenyangguang Zhang (Tsinghua University) · Ruida Zhang (Department of Automation, Tsinghua University, Tsinghua University) · Federico Tombari (Google, TUM) · Nassir Navab (TU Munich) · Benjamin Busam (Technical University of Munich)
Robust Synthetic-to-Real Transfer for Stereo Matching
Jiawei Zhang (Beijing University of Aeronautics and Astronautics) · Jiahe Li (Beijing University of Aeronautics and Astronautics) · Lei Huang (Beihang University) · Xiaohan Yu (Macquarie University) · Lin Gu (RIKEN / the University of Tokyo) · Jin Zheng (Beijing University of Aeronautics and Astronautics) · Xiao Bai (Beijing University of Aeronautics and Astronautics)
Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline
Yu chen (Beijing Waiyan Online Digital Technology Co., Ltd) · Fei Gao (Hangzhou Institute of Technology, Xidian University) · YanguangZhang (Hangzhou Dianzi University) · Maoying Qiao (University of Technology Sydney) · Nannan Wang (Xidian University)
ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
Dar-Yen Chen (SketchX) · Hamish Tennent (PicCollage) · Ching-Wen Hsu (PicCollage)
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee (Arizona State University) · Tejas Gokhale (University of Maryland, Baltimore County) · Chitta Baral (Arizona State University) · 'YZ' Yezhou Yang (Arizona State University)
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
Inhwan Bae (GIST) · Young-Jae Park (GIST) · Hae-Gon Jeon (GIST)
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
Yaxu Xie (German Research Center for Artificial Intelligence) · Alain Pagani (German Research Center for Artificial Intelligence (DFKI)) · Didier Stricker (Universität Kaiserslautern)
TexVocab: Texture Vocabulary-conditioned Human Avatars
Yuxiao Liu (None) · Zhe Li (Tsinghua University) · Yebin Liu (Tsinghua University) · Haoqian Wang (Tsinghua University, Tsinghua University)
When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach
TAO MA (Peking University) · Bing Bai (Qiyuan Lab) · Haozhe Lin (None) · Heyuan Wang (Peking University) · Yu Wang (Qiyuan Lab) · Lin Luo (Peking University) · Lu Fang (Tsinghua University, Tsinghua University)
Mitigating Motion Blur in Neural Radiance Fields with Events and Frames
Marco Cannici (Robotics and Perception Group, Department of Informatics, University of Zurich) · Davide Scaramuzza (University of Zurich)
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features
Thomas Wimmer (École Polytechnique & Technical University of Munich) · Peter Wonka (KAUST) · Maks Ovsjanikov (Ecole Polytechnique, France)
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun (Shanghai Jiao Tong University) · Ye Fang (None) · Tong Wu (None) · Pan Zhang (Shanghai Artificial Intelligence Laboratory) · Yuhang Zang (Nanyang Technological University) · Shu Kong (University of Macau, Texas A&M University) · Yuanjun Xiong (Mthreads) · Dahua Lin (The Chinese University of Hong Kong) · Jiaqi Wang (Shanghai AI Laboratory)
Discriminability-Driven Channel Selection for Out-of-Distribution Detection
Yue Yuan (Shandong University) · Rundong He (Shandong University) · Yicong Dong (Shandong University) · Zhongyi Han (Shandong University) · Yilong Yin (Shandong University)
DemoFusion: Democratising High-Resolution Image Generation With No $$$
Ruoyi DU (Beijing University of Posts and Telecommunications) · Dongliang Chang (Tsinghua University) · Timothy Hospedales (None) · Yi-Zhe Song (None) · Zhanyu Ma (Beijing University of Post and Telecommunication)
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Seokju Yun (University of Seoul) · Youngmin Ro (University of Seoul)
SketchINR: A First Look into Sketches as Implicit Neural Representations
Hmrishav Bandyopadhyay (University of Surrey) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Pinaki Nath Chowdhury (University of Surrey) · Aneeshan Sain (University of Surrey) · Tao Xiang (University of Surrey) · Timothy Hospedales (None) · Yi-Zhe Song (None)
Makeup Prior Models for 3D Facial Makeup Estimation and Applications
Xingchao Yang (Cyberagent) · Takafumi Taketomi (CyberAgent) · Yuki Endo (University of Tsukuba) · Yoshihiro Kanamori (University of Tsukuba)
Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement
Jinyoung Jun (Korea University) · Jae-Han Lee (Gauss Labs) · Chang-Su Kim (Korea University)
Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras
Ashwath Shetty (Saarland Informatics Campus, Max-Planck Institute) · Marc Habermann (Saarland Informatics Campus, Max-Planck Institute) · Guoxing Sun (Max Planck Institute for Informatics) · Diogo Luvizon (Saarland Informatics Campus, Max-Planck Institute) · Vladislav Golyanik (MPI for Informatics) · Christian Theobalt (MPI Informatik)
Neighbor Relations Matter in Video Scene Detection
Jiawei Tan (Chongqing University) · Hongxing Wang (Chongqing University) · Jiaxin Li (Chongqing University) · Zhilong Ou (Chongqing University) · Zhangbin Qian (Chongqing University)
NOPE: Novel Object Pose Estimation from a Single Image
Van Nguyen Nguyen (Ecole des Ponts ParisTech) · Thibault Groueix (Adobe Systems) · Georgy Ponimatkin (CIIRC, Czech Technical University, Czech Technical University of Prague) · Yinlin Hu (Magic Leap) · Renaud Marlet (INRIA) · Mathieu Salzmann (EPFL) · Vincent Lepetit (Ecole des Ponts ParisTech)
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
Yuqi Wang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Jiawei He (Institute of automation, Chinese Academy of Sciences) · Lue Fan (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Hongxin Li (Institute of Automation, Chinese Academy of Sciences) · Yuntao Chen (CAIR, HKISI, CAS) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
Youtian Lin (Nanjing university) · Zuozhuo Dai (Alibaba Group) · Siyu Zhu (Fudan University) · Yao Yao (Nanjing University)
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Yusheng Dai (University of Science and Technology of China) · HangChen (University of Science and Technology of China) · Jun Du (University of Science and Technology of China) · Ruoyu Wang (University of Science and Technology of China) · shihao chen (University of Science and Technology of China) · Haotian Wang (University of Science and Technology of China) · Chin-Hui Lee (Georgia Institute of Technology)
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Changhoon Kim (Arizona State University) · Kyle Min (Intel Labs) · Maitreya Patel (Arizona State University) · Sheng Cheng (Arizona State University) · 'YZ' Yezhou Yang (Arizona State University)
Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics
Xingtao Wang (Harbin Institute of Technology) · Hongliang Wei (Harbin Institute of Technology) · Xiaopeng Fan (Harbin Institute of Technology) · Debin Zhao (Harbin Institute of Technology)
MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
Jakub Micorek (Technische Universität Graz) · Horst Possegger (Graz University of Technology) · Dominik Narnhofer (Technische Universität Graz) · Horst Bischof (Graz University of Technology) · Mateusz Kozinski (Technische Universität Graz)
MuRF: Multi-Baseline Radiance Fields
Haofei Xu (ETH Zurich) · Anpei Chen (Department of Computer Science, ETHZ - ETH Zurich) · Yuedong Chen (Monash University) · Christos Sakaridis (ETH Zurich) · Yulun Zhang (Shanghai Jiao Tong University) · Marc Pollefeys (ETH Zurich / Microsoft) · Andreas Geiger (University of Tübingen) · Fisher Yu (ETH Zurich)
Link-Context Learning for Multimodal LLMs
Yan Tai (Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China) · Weichen Fan (HyperGAI) · Zhao Zhang (Sensetime Research) · Ziwei Liu (Nanyang Technological University)
InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields
Dongqing Wang (EPFL) · Tong Zhang (EPFL) · Alaa Abboud (EPFL - EPF Lausanne) · Sabine Süsstrunk (None)
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang (Peking University) · Jiaming Liu (Peking University) · Chenxuan Li (Peking university) · Yuan Zhang (Peking University) · Ma Junpeng (Peking University) · Xinyu Wei (Peking University) · Kevin Zhang (Peking University) · Maurice Chong (Peking University) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Yijiang Liu (Nanjing University) · Shanghang Zhang (Peking University)
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai (None) · Kuofeng Gao (Tsinghua University, Tsinghua University) · Shaobo Min (University of Science and Technology of China) · Shu-Tao Xia (Shenzhen International Graduate School, Tsinghua University) · Zhifeng Li (Tencent) · Wei Liu (Tencent AI Lab)
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva (University of Oxford) · Andrew Zisserman (University of Oxford)
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
Xuying Zhang (Nankai University) · Bo-Wen Yin (Nankai University) · yuming chen (None) · Zheng Lin (Nankai University) · Yunheng Li (Nankai University) · Qibin Hou (Nankai University) · Ming-Ming Cheng (Nankai University, Tsinghua University)
Unsupervised Template-assisted Point Cloud Shape Correspondence Network
Jiacheng Deng (University of Science and Technology of China) · Jiahao Lu (University of Science and Technology of China) · Tianzhu Zhang (University of Science and Technology of China)
Efficient Model Stealing Defense with Noise Transition Matrix
Dong-Dong Wu (Southeast University) · Chilin Fu (Ant Group) · Weichang Wu (Alibaba Group) · Wenwen Xia (Shanghai Jiaotong University) · Xiaolu Zhang (None) · JUN ZHOU (Ant Group) · Min-Ling Zhang (Southeast University)
HOIAnimator: Text-Prompt Human-Object Animations Generation with Perceptive Diffusion Models
Wenfeng Song (Beijing Information Science and Technology University) · Xinyu Zhang (Beijing Information Science and Technology University) · Shuai Li (Beijing University of Aeronautics and Astronautics) · Yang Gao (Beijing University of Aeronautics and Astronautics) · Aimin Hao (None) · Xia HOU (Beijing Information Science & Technology University) · Chenglizhao Chen (China University of Petroleum) · Ning Li (Beijing Information Science and Technology University) · Hong Qin (Stony Brook University (State University of New York at Stony Brook))
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Hangjie Yuan (Zhejiang University) · Shiwei Zhang (Alibaba Group) · Xiang Wang (Huazhong University of Science and Technology) · Yujie Wei (Fudan University) · Tao Feng (Tsinghua University) · Yining Pan (Singapore University of Technology and Design) · Yingya Zhang (Alibaba Group) · Ziwei Liu (Nanyang Technological University) · Samuel Albanie (University of Cambridge) · Dong Ni (Zhejiang University)
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal (University of California, Los Angeles) · Yonatan Bitton (Google) · Idan Szpektor (Google) · Kai-Wei Chang (University of California, Los Angeles) · Aditya Grover (University of California, Los Angeles)
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu (Hong Kong University of Science and Technology (Guangzhou)) · Xu Zheng (HKUST) · Jiazhou Zhou (Hong Kong University of Science and Technology) · Lin Wang (Hong Kong University of Science and Technology)
HEAL-SWIN: A Vision Transformer On The Sphere
Oscar Carlsson (Division of Algebra and Geometry, Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg) · Jan E. Gerken (Chalmers University of Technology) · Hampus Linander (Chalmers University of Technology) · Heiner Spiess (Technische Universität Berlin) · Fredrik Ohlsson (Umea University) · Christoffer Petersson (Zenseact) · Daniel Persson (Chalmers University of Technology)
How Far Can We Compress Instant NGP-Based NeRF?
Yihang Chen (Shanghai Jiao Tong University) · Qianyi Wu (Monash University) · Mehrtash Harandi (Monash University) · Jianfei Cai (Monash University)
Putting the Object Back into Video Object Segmentation
Ho Kei Cheng (University of Illinois Urbana-Champaign) · Seoung Wug Oh (Adobe Systems) · Brian Price (Adobe Research) · Joon-Young Lee (Adobe Research) · Alexander G. Schwing (UIUC)
Towards 3D Vision with Low-Cost Single-Photon Cameras
Fangzhou Mu (University of Wisconsin-Madison) · Carter Sifferman (University of Wisconsin - Madison) · Sacha Jungerman (University of Wisconsin - Madison) · Yiquan Li (University of Wisconsin - Madison) · Zhiyue Han (None) · Michael Gleicher (Department of Computer Sciences, University of Wisconsin - Madison) · Mohit Gupta (Department of Computer Sciences, University of Wisconsin - Madison) · Yin Li (University of Wisconsin, Madison)
Lane2Seq: Towards Unified Lane Detection via Sequence Generation
Kunyang Zhou (Southeast University)
FMA-Net: Flow Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
Geunhyuk Youk (Korea Advanced Institute of Science and Technology) · Jihyong Oh (Chung-Ang University) · Munchurl Kim (Korea Advanced Institute of Science and Technology)
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers
Shahaf Arica (Technion - Israel Institute of Technology) · Or Rubin (Technion - Israel Institute of Technology) · Sapir Gershov (Technion - Israel Institute of Technology) · Shlomi Laufer (Technion)
Color Shift Estimation-and-Correction for Image Enhancement
Yiyu Li (City University of Hong Kong) · Ke Xu (City University of Hong Kong) · Gerhard Hancke Hancke (None) · Rynson W.H. Lau (City University of Hong Kong)
UniDepth: Universal Monocular Metric Depth Estimation
Luigi Piccinelli (ETH Zurich) · Yung-Hsu Yang (None) · Christos Sakaridis (ETH Zurich) · Mattia Segu (ETH Zurich - Swiss Federal Institute of Technology) · Siyuan Li (ETH Zurich) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Fisher Yu (ETH Zurich)
Dexterous Grasp Transformer
Guo-Hao Xu (Sun Yat-sen University) · Yi-Lin Wei (SUN YAT-SEN UNIVERSITY) · Dian Zheng (None) · Xiao-Ming Wu (SUN YAT-SEN UNIVERSITY) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY)
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain (Georgia Tech) · Jianwei Yang (Microsoft Research) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction
Ishak Ayad (ETIS & AGM, CY Cergy Paris University, ENSEA, CNRS) · Nicolas Larue (ETIS , CY Cergy Paris University, ENSEA, CNRS, University of Ljubljana) · Mai K. Nguyen (ETIS , CY Cergy Paris University, ENSEA, CNRS)
Cross-dimension Affinity Distillation for 3D EM Neuron Segmentation
Xiaoyu Liu (University of Science and Technology of China) · Miaomiao Cai (University of Science and Technology of China) · Yinda Chen (University of Science and Technology of China) · Yueyi Zhang (University of Science and Technology of China) · Te Shi (Institute of Artificial Intelligence, Hefei Comprehensive National Science Center) · Ruobing Zhang (Suzhou Institute of Biomedical Engineering and Technology) · Xuejin Chen (University of Science and Technology of China) · Zhiwei Xiong (None)
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
Xunjiang Gu (University of Toronto) · Guanyu Song (University of Toronto) · Igor Gilitschenski (University of Toronto) · Marco Pavone (NVIDIA) · Boris Ivanovic (NVIDIA)
Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation
Keonhee Han (Technical University of Munich) · Dominik Muhle (Technical University of Munich) · Felix Wimbauer (Technical University of Munich) · Daniel Cremers (Technical University Munich)
CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection
JiaBao Wang (Nankai University) · yuming chen (None) · Zhaohui Zheng (Nankai University) · Xiang Li (Nankai University) · Ming-Ming Cheng (Nankai University, Tsinghua University) · Qibin Hou (Nankai University)
Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion
Lalit Manam (Indian Institute of Science) · Venu Madhav Govindu (Indian Institute of Science)
Seeing the World through Your Eyes
Hadi Alzayer (University of Maryland) · Kevin Zhang (UMD CP / Adobe) · Brandon Y. Feng (Massachusetts Institute of Technology) · Christopher Metzler (University of Maryland, College Park) · Jia-Bin Huang (University of Maryland, College Park)
Equivariant Multi-Modality Image Fusion
Zixiang Zhao (Xi'an Jiaotong University) · Haowen Bai (Xi'an Jiaotong University) · Jiangshe Zhang (Xi'an Jiaotong University) · Yulun Zhang (Shanghai Jiao Tong University) · Kai Zhang (None) · Shuang Xu (Northwest Polytechnical University Xi'an) · Dongdong Chen (Heriot-Watt University) · Radu Timofte (University of Würzburg) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.)
PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation
Jinfeng Xu (Huazhong University of Science and Technology) · Siyuan Yang (HUST) · Xianzhi Li (Huazhong University of Science and Technology) · Yuan Tang (Huazhong University of Science and Technology) · yixue Hao (Huazhong University of Science and Technology) · Long Hu (Huazhong University of Science and Technology) · Min Chen (South China University of Technology)
Residual Denoising Diffusion Models
Jiawei Liu (Shenyang Institute of Automation, Chinese Academy of Sciences) · Qiang Wang (Shenyang University) · Huijie Fan (None) · Yinong Wang (University of Hong Kong) · Yandong Tang (Shenyang Institue of Automation) · Liangqiong Qu (The University of Hong Kong)
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
Zheng Li (Nankai University) · Xiang Li (Nankai University) · xinyi fu (Ant group) · Xin Zhang (Nankai University) · Weiqiang Wang (University of Southern California) · Shuo Chen (RIKEN) · Jian Yang (Nankai University)
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Ruoyu Feng (University of Science and Technology of China) · Wenming Weng (None) · Yanhui Wang (None) · Yuhui Yuan (Microsoft Research Asia) · Jianmin Bao (Microsoft) · Chong Luo (Microsoft Research Asia) · Zhibo Chen (University of Science and Technology of China) · Baining Guo (Microsoft Research)
CORES: Convolutional Response-based Score for Out-of-distribution Detection
Keke Tang (Guangzhou University) · Chao Hou (Guangzhou University) · Weilong Peng (None) · Runnan Chen (None) · Peican Zhu (Northwest Polytechnical University Xi'an) · Wenping Wang (Texas A&M University - College Station) · Zhihong Tian (Guangzhou University)
Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
Zeliang Zhang (University of Rochester) · Mingqian Feng (University of Rochester) · Zhiheng Li (Amazon AGI) · Chenliang Xu (University of Rochester)
MoDE: CLIP Data Experts via Clustering
Jiawei Ma (Columbia University) · Po-Yao Huang (Facebook) · Saining Xie (Facebook) · Shang-Wen Li (Facebook) · Luke Zettlemoyer (University of Washington) · Shih-Fu Chang (Columbia University) · Wen-tau Yih (Meta Platforms, Inc.) · Hu Xu (FAIR, Multimodal Foundation)
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
Xingyi Li (Huazhong University of Science and Technology) · Zhiguo Cao () · Yizheng Wu (Nanyang Technological University) · Kewei Wang (Huazhong University of Science and Technology) · Ke Xian (Nanyang Technological University) · Zhe Wang (Sensetime Group Limited) · Guosheng Lin (Nanyang Technological University)
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation
Aysim Toker (Technical University Munich) · Marvin Eisenberger (Technical University Munich) · Daniel Cremers (Technical University Munich) · Laura Leal-Taixe (NVIDIA)
Dual-consistency Model Inversion for Non-exemplar Class Incremental Learning
Zihuan Qiu (University of Electronic Science and Technology of China) · Yi Xu (Dalian University of Technology) · Fanman Meng (University of Electronic Science and Technology of China) · Hongliang Li (University of Electronic Science and Technology of China, Tsinghua University) · Linfeng Xu (University of Electronic Science and Technology of China) · Qingbo Wu (University of Electronic Science and Technology of China)
Class Tokens Infusion for Weakly Supervised Semantic Segmentation
Sung-Hoon Yoon (KAIST) · Hoyong Kwon (KAIST) · Hyeonseong Kim (KAIST) · Kuk-Jin Yoon (KAIST)
PointOBB: Learning Oriented Object Detection via Single Point Supervision
Junwei Luo (Wuhan University) · Xue Yang (Shanghai AI Laboratory) · Yi Yu (Southeast University) · Qingyun Li (Harbin Institute of Technology) · Junchi Yan (Shanghai Jiao Tong University) · Yansheng Li (Wuhan University)
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Christen Millerdurai (Max Planck Institute for Informatics) · Hiroyasu Akada (Max Planck Institute for Informatics) · Jian Wang (Max Planck Institute for Informatics) · Diogo Luvizon (Saarland Informatics Campus, Max-Planck Institute) · Christian Theobalt (MPI Informatik) · Vladislav Golyanik (MPI for Informatics)
LCD: Towards Hierarchical Embeddings with Localizability, Composability, and Decomposability Learned from Anatomy
Mohammad Reza Hosseinzadeh Taher (Arizona State University) · Michael Gotway (Mayo Clinic) · Jianming Liang (Arizona State University)
SeD: Semantic-Aware Discriminator for Image Super-Resolution
Bingchen Li (University of Science and Technology of China) · Xin Li (None) · Hanxin Zhu (University of Science and Technology of China) · YEYING JIN (National University of Singapore) · Ruoyu Feng (University of Science and Technology of China) · Zhizheng Zhang (Microsoft Research) · Zhibo Chen (University of Science and Technology of China)
Category-Level Multi-Part Multi-Joint 3D Shape Assembly
Yichen Li (Massachusetts Institute of Technology) · Kaichun Mo (NVIDIA Research) · Yueqi Duan (None) · He Wang (None) · Jiequan Zhang (None) · Lin Shao (National University of Singapore) · Wojciech Matusik (Massachusetts Institute of Technology) · Leonidas Guibas (Stanford University)
JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
YUNCHENG GUO (None) · Xiaodong Gu (Fudan University)
Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms
Joren Brunekreef (Netherlands Cancer Institute) · Eric Marcus (Netherlands Cancer Institute) · Ray Sheombarsing (None) · Jan-Jakob Sonke (Netherlands Cancer Institute) · Jonas Teuwen (Netherlands Cancer Institute)
A Category Agnostic Model for Visual Rearrangement
Yuyi Liu (Institute of Computing Technology,University of the Chinese Academy of Sciences) · Xinhang Song (None) · Weijie Li (Alibaba Group) · XIAOHAN Wang (Xi'an Jiaotong University) · Shuqiang Jiang (Institute of Computing Technology, Chinese Academy of Sciences)
WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng (Yale University) · Hyoungseob Park (Yale University) · Fengyu Yang (Yale University) · Daniel Wang (Yale University) · Stefano Soatto (University of California, Los Angeles) · Dong Lao (University of California, Los Angeles) · Alex Wong (Yale University)
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang (Shanghai AI Laboratory) · Xiaohan Mao (Shanghai Jiaotong University) · Chenming Zhu (The Chinese University Of Hong Kong, Shenzhen) · Runsen Xu (The Chinese University of Hong Kong) · Ruiyuan Lyu (Shanghai AI Laboratory) · Peisen Li (Tsinghua University, Tsinghua University) · Xiao Chen (The Chinese University of Hong Kong) · Wenwei Zhang (None) · Kai Chen (Shanghai AI Laboratory) · Tianfan Xue (The Chinese University of Hong Kong) · Xihui Liu (The University of Hong Kong) · Cewu Lu (Shanghai Jiao Tong University) · Dahua Lin (The Chinese University of Hong Kong) · Jiangmiao Pang (Shanghai AI Laboratory )
Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding
Alessandro Achille (California Institute of Technology) · Greg Ver Steeg (University of California, Riverside) · Tian Yu Liu (University of California, Los Angeles) · Matthew Trager (Amazon) · Carson Klingenberg (Amazon Web Services) · Stefano Soatto (AWS)
DETRs Beat YOLOs on Real-time Object Detection
Yian Zhao (Peking University) · Wenyu Lv (Baidu) · Shangliang Xu (Baidu) · Jinman Wei (Tianjin University) · Guanzhong Wang (Baidu) · Qingqing Dang (Baidu) · Yi Liu (None) · Jie Chen (Peking University)
DIOD: Self-Distillation Meets Object Discovery
Sandra Kara (CEA) · Hejer AMMAR (CEA) · Julien Denize (CEA) · Florian Chabot (CEA) · Quoc Cuong PHAM (CEA)
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
Guillaume Jaume (Harvard University) · Anurag Vaidya (Massachusetts Institute of Technology) · Richard J. Chen (Harvard University) · Drew F. K. Williamson (Massachusetts General Hospital, Harvard University) · Paul Pu Liang (Carnegie Mellon University) · Faisal Mahmood (Harvard University)
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
LIn Zhao (Infinigence) · Tianchen Zhao (Tsinghua University, Tsinghua University) · Zinan Lin (Microsoft Research) · Xuefei Ning (Tsinghua University, Tsinghua University) · Guohao Dai (Shanghai Jiaotong University) · Huazhong Yang (Tsinghua University, Tsinghua University) · Yu Wang (Tsinghua University, Tsinghua University)
Amodal Completion via Progressive Mixed Context Diffusion
Katherine Xu (University of Pennsylvania) · Lingzhi Zhang (School of Engineering and Applied Science, University of Pennsylvania) · Jianbo Shi (None)
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang (Princeton University) · Difan Liu (Adobe Research) · Yan Kang (None) · Yijun Li (Adobe Research) · Zhe Lin (Adobe Research) · Niraj Jha (Princeton University) · Yuchen Liu (None)
Deep Generative Model based Rate-Distortion for Image Downscaling Assessment
yuanbang liang (Cardiff Univeristy) · Bhavesh Garg (IIT Bombay) · Paul L. Rosin (Cardiff University) · Yipeng Qin (Cardiff University)
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Chun Feng (University of Science and Technology of China) · Joy Hsu (Stanford University) · Weiyu Liu (Stanford University) · Jiajun Wu (Stanford University)
Forecasting of 3D Whole-body Human Poses with Grasping Objects
yan haitao (None) · Qiongjie Cui (Nanjing University of Science and Technology) · Jiexin Xie (Fudan University) · Shijie Guo (Fudan University)
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
Xiang Li (National University of Singapore) · Qianli Shen (National University of Singapore) · Kenji Kawaguchi (National University of Singapore)
Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer
Wenqiao Zhang (National University of Singapore) · Zheqi Lv (Zhejiang University)
Towards Co-Evaluation of Cameras, HDR, and Algorithms for Industrial-Grade 6DoF Pose Estimation
Agastya Kalra (Google) · Guy Stoppi (Intrinsic) · Dmitrii Marin (Intrinsic) · Vage Taamazyan (Intrinsic) · Aarrushi Shandilya (Intrinsic AI) · Rishav Agarwal (Intrinsic) · Anton Boykov (University of Waterloo) · Aaron Chong (Google) · Michael Stark (Intrinsic)
Correcting Diffusion Generation through Resampling
Yujian Liu (University of California, Santa Barbara) · Yang Zhang (International Business Machines) · Tommi Jaakkola (Massachusetts Institute of Technology) · Shiyu Chang (UC Santa Barbara)
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
Jianqing Zhang (Shanghai Jiao Tong University & Tsinghua University) · Yang Liu (Tsinghua University, Tsinghua University) · Yang Hua (Queen's University Belfast) · Jian Cao (Shanghai Jiaotong University)
Partial-to-Partial Shape Matching with Geometric Consistency
Viktoria Ehm (Technische Universität München) · Maolin Gao (None) · Paul Roetzer (University of Bonn) · Marvin Eisenberger (Technical University Munich) · Daniel Cremers (Technical University Munich) · Florian Bernard (University of Bonn)
Deep Imbalanced Regression via Hierarchical Classification Adjustment
Haipeng Xiong (National University of Singapore) · Angela Yao (National University of Singapore)
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
CONG MA (Senseauto Research) · Qiao Lei (SenseAuto Research) · Chengkai Zhu (SenseAuto Research) · Kai Liu (SenseAuto Research) · Zelong Kong (SenseAuto Research) · Liqing (SenseAuto) · Xueqi Zhou (Beijing Sensetime Technology Development Co., Ltd.) · Yuheng KAN (Zhejiang University) · Wei Wu (Tsinghua University, Tsinghua University)
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
song yiran (None) · Qianyu Zhou (Shanghai Jiao Tong University) · Xiangtai Li (Nanyang Technological University) · Deng-Ping Fan (ETH Zurich) · Xuequan Lu (La Trobe University) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
Text-guided Explorable Image Super-resolution
Kanchana Vaishnavi Gandikota (None) · Paramanand Chandramouli (Universität Siegen)
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
Kaiwen Zhang (Tsinghua University) · Yifan Zhou (Nanyang Technological University) · Xudong XU (Shanghai AI Laboratory) · Bo Dai (Shanghai AI Laboratory) · Xingang Pan (None)
NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation
Ziyi Chen (Zhejiang University) · Xiaolong Wu (Georgia Institute of Technology) · Yu Zhang (Zhejiang University)
Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis
Mingyang Zhao (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Jiang Jingen (Shandong University) · Lei Ma (Peking University) · Shiqing Xin (Shandong University) · Gaofeng Meng (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Dong-Ming Yan (Institute of Automation, Chinese Academy of Sciences)
FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning
Gihun Lee (KAIST AI) · Minchan Jeong (Korea Advanced Institute of Science and Technology) · SangMook Kim (KAIST) · Jaehoon Oh (Samsung Advanced Institute of Technology) · Se-Young Yun (KAIST)
LAENeRF: Local Appearance Editing for Neural Radiance Fields
Lukas Radl (Graz University of Technology) · Michael Steiner (Technische Universität Graz) · Andreas Kurz (Technische Universität Graz) · Markus Steinberger (Technische Universität Graz)
Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting
Haiwei Chen (USC-ICT, Vision and Graphics Lab) · Yajie Zhao (University of Southern California)
Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation
Philipp Schröppel (University of Freiburg, Germany) · Christopher Wewer (Max Planck Institute for Informatics, Saarland Informatics Campus) · Jan Lenssen (Saarland Informatics Campus, Max-Planck Institute) · Eddy Ilg (None) · Thomas Brox (University of Freiburg)
DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions
Yunxiao Shi (Qualcomm AI Research) · Manish Singh (Qualcomm AI Research) · Hong Cai (Qualcomm AI Research) · Fatih Porikli (QualComm)
Exploiting Style Latent Flows for Generalizing Video Deepfake Detection
Jongwook Choi (Chung-Ang University) · Taehoon Kim (Chung-Ang University) · Yonghyun Jeong (NAVER) · Seungryul Baek (UNIST) · Jongwon Choi (Chung-Ang University)
Bayesian Differentiable Physics for Cloth Digitalization
Deshan Gong (University of Leeds) · Ningtao Mao (University of Leeds) · He Wang (None)
MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation
Xiaolong Deng (Shenzhen University) · Huisi Wu (Shenzhen University) · Runhao Zeng (Shenzhen MSU-BIT University) · Jing Qin (Hong Kong Polytechnic University)
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Zeyinzi Jiang (Alibaba Group) · Chaojie Mao (Alibaba Group) · Yulin Pan (Alibaba Group, China) · Zhen Han (Alibaba Group) · Jingfeng Zhang (Alibaba Group)
DREAM: Diffusion Rectification and Estimation-Adaptive Models
Jinxin Zhou (Ohio State University, Columbus) · Tianyu Ding (Microsoft) · Tianyi Chen (Microsoft) · Jiachen Jiang (Ohio State University, Columbus) · Ilya Zharkov (Microsoft) · Zhihui Zhu (Ohio State University, Columbus) · Luming Liang (Microsoft)
Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling
Leon Sick (Ulm University) · Dominik Engel (Ulm University) · Pedro Hermosilla (Technische Universität Wien) · Timo Ropinski (Ulm University)
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace (University of Trento) · Aliaksandr Siarohin (Snap Inc.) · Ivan Skorokhodov (KAUST) · Ekaterina Deyneka (Snap Inc.) · Tsai-Shien Chen (University of California, Merced) · Anil Kag (Snap Inc.) · Yuwei Fang (Snap Inc.) · Aleksei Stoliar (None) · Elisa Ricci (University of Trento) · Jian Ren (Snap Inc.) · Sergey Tulyakov (Snap Inc.)
URHand: Universal Relightable Hands
Zhaoxi Chen (Nanyang Technological University) · Gyeongsik Moon (None) · Kaiwen Guo (Google) · Chen Cao (Facebook) · Stanislav Pidhorskyi (Meta) · Tomas Simon (Meta) · Rohan Joshi (Facebook) · Yuan Dong (Facebook) · Yichen Xu (Meta platforms inc) · Bernardo Pires (Meta Platforms Inc.) · He Wen (Meta Platformts, Inc.) · Lucas Evans (Meta) · Bo Peng (Meta Platforms Inc.) · Julia Buffalini (Meta) · Autumn Trimble (Meta) · Kevyn McPhail (Meta) · Melissa Schoeller (Meta Platforms Inc) · Shoou-I Yu (Reality Labs Research, Meta) · Javier Romero (None) · Michael Zollhoefer (Meta) · Yaser Sheikh (Meta) · Ziwei Liu (Nanyang Technological University) · Shunsuke Saito (Reality Labs Research)
DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
Yuming Gu (USC Institute for Creative Technologies, University of Southern California) · Hongyi Xu (Bytedance) · You Xie (Bytedance) · Guoxian Song (Bytedance Inc) · Yichun Shi (ByteDance) · Di Chang (University of Southern California) · Jing Yang (USC Institute for Creative Technologies) · Linjie Luo (ByteDance Inc.)
Enhancing Visual Continual Learning with Language-Guided Supervision
Bolin Ni (Institute of Automation, Chinese Academy of Sciences) · Hongbo Zhao (Institute of Automation, Chinese Academy of Sciences) · Chenghao Zhang (Alibaba Group) · Ke Hu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Gaofeng Meng (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Shiming Xiang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Generating Human Motion in 3D Scenes from Text Descriptions
Zhi Cen (Zhejiang University) · Huaijin Pi (Zhejiang University) · Sida Peng (None) · Zehong Shen (Zhejiang University) · Minghui Yang (Ant Group) · Shuai Zhu (Ant Group) · Hujun Bao (Zhejiang University) · Xiaowei Zhou (None)
Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
JINLONG LI (Cleveland State University) · Baolu Li (Cleveland State University) · Zhengzhong Tu (University of Texas at Austin) · XINYU LIU (Cleveland State University) · Qing Guo (Institute of High Performance Computing, Singapore, A*STAR) · Felix Juefei Xu () · Runsheng Xu (University of California, Los Angeles) · Hongkai Yu (Cleveland State University)
Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
Yuang Ai (Institute of Automation, Chinese Academy of Sciences) · Xiaoqiang Zhou (University of Science and Technology of China) · Huaibo Huang (Institute of Automation, Chinese Academy of Sciences) · Lei Zhang (The Hong Kong Polytechnic University) · Ran He (None)
Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge
Haoxiang Ma (Beihang University) · Modi Shi (Beijing University of Aeronautics and Astronautics) · Boyang GAO (Geometry Robotics ltd. & Harbin Institute of Technology) · Di Huang (Beihang University)
Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment
Aobo Li (Xidian University) · Jinjian Wu (Xidian University) · Yongxu Liu (Xidian University) · Leida Li (Xidian University)
Grid Diffusion Models for Text-to-Video Generation
Taegyeong Lee (Ulsan National Institute of Science and Technology) · Soyeong Kwon (Ulsan National Institute of Science and Technology) · Taehwan Kim (UNIST)
Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Devikalyan Das (Max Planck Institute for Informatics) · Christopher Wewer (Max Planck Institute for Informatics, Saarland Informatics Campus) · Raza Yunus (Saarland Informatics Campus, Max-Planck Institute) · Eddy Ilg (None) · Jan Lenssen (Saarland Informatics Campus, Max-Planck Institute)
Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham (None) · Matthew Fisher (Adobe Research) · James Hays (Georgia Institute of Technology) · Nicholas Kolkin (Adobe Systems) · Yuchen Liu (None) · Richard Zhang (Adobe Systems) · Tobias Hinz (Adobe Systems)
Making Vision Transformers Truly Shift-Equivariant
Renan A. Rojas-Gomez (University of Illinois at Urbana Champaign) · Teck-Yian Lim (DSO National Laboratories) · Minh Do (University of Illinois at Urbana-Champaign) · Raymond A. Yeh (Purdue University)
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Zehuan Huang (Beijing University of Aeronautics and Astronautics) · Hao Wen (Beijing University of Aeronautics and Astronautics) · Junting Dong (None) · Yaohui Wang (Shanghai AI Laboratory) · Yangguang Li (Shanghai AI Laboratory) · Xinyuan Chen (Shanghai Artificial Intelligence Laboratory) · Yan-Pei Cao (Tencent ARC Lab) · Ding Liang (Tsinghua University, Tsinghua University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Bo Dai (Shanghai AI Laboratory) · Lu Sheng (Beihang University)
RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses
bedrettin cetinkaya (Middle East Technical University) · Sinan Kalkan (Middle East Technical University) · Emre Akbas (METU)
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song () · Wenhao Chai (University of Washington) · Guanhong Wang (Zhejiang University) · Haoyang Zhou (Zhejiang University) · Feiyang Wu (Zhejiang University) · Yucheng Zhang (Zhejiang University) · Tian Ye (Hong Kong University of Science and Technology, Guangzhou Campus) · Haozhe Chi (Zhejiang University) · Xun Guo (Microsoft Research Asia) · Yanting Zhang (Donghua University, Shanghai) · Yan Lu (Microsoft Research Asia) · Jenq-Neng Hwang (None) · Gaoang Wang (Zhejiang University)
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi (Independent Researcher)
Learning Triangular Distribution in Visual World
Ping Chen (MicroBT Inc.) · Xingpeng Zhang (Southwest Petroleum University) · Chengtao Zhou (Microbt) · dichao Fan (MicroBT) · Peng Tu (RuqiMobility Inc.) · Le Zhang (shenzhen MicroBT Electronics Technology Corporation ) · Yanlin Qian (Tampere University)
Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng (University of Oxford) · Andrea Vedaldi (University of Oxford)
Fine-grained Bipartite Concept Factorization for Clustering
Chong Peng (None) · Pengfei Zhang (Qingdao University) · Yongyong Chen (Harbin Institute of Technology (Shenzhen)) · zhao kang (University of Electronic Science and Technology of China) · Chenglizhao Chen (China University of Petroleum) · Qiang Cheng (University of Kentucky)
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
Prashant Kumar (Indian Institute of Technology Delhi) · Kshitij Madhav Bhat (Indian Institute of Technology Indore) · Vedang Bhupesh Shenvi Nadkarni (Birla Institute of Technology and Science Pilani (BITS Pilani)) · Prem Kalra (Indian Institute of Technology, Delhi)
Generalized Event Cameras
Varun Sundar (University of Wisconsin, Madison) · Matthew Dutson (University of Wisconsin, Madison) · Andrei Ardelean (NovoViz) · Claudio Bruschini (EPFL - EPF Lausanne) · Edoardo Charbon (EPFL - EPF Lausanne) · Mohit Gupta (Department of Computer Sciences, University of Wisconsin - Madison)
Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
Yuang Ai (Institute of Automation, Chinese Academy of Sciences) · Huaibo Huang (Institute of Automation, Chinese Academy of Sciences) · Xiaoqiang Zhou (University of Science and Technology of China) · Jiexiang Wang (University of Science and Technology of China) · Ran He (None)
DIEM: Decomposition-Integration Enhancing Multimodal Insights
Xinyi Jiang (None) · Guoming Wang (Zhejiang University) · Junhao Guo (Zhejiang University) · Juncheng Li (Zhejiang University) · Wenqiao Zhang (National University of Singapore) · Rongxing Lu (University of New Brunswick) · Siliang Tang (Zhejiang University)
Balancing Act: Distribution-Guided Debiasing in Diffusion Models
Rishubh Parihar (Indian Institute of Science, Bangalore) · Abhijnya Bhat (Indian Institute of Science, Indian institute of science, Bangalore) · Abhipsa Basu (Indian Institute of Science) · Saswat Mallick (Indian Institute of Science, Indian institute of science, Bangalore) · Jogendra Kundu Kundu (None) · R. Venkatesh Babu (Indian Institute of Science)
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Zhen Zhao (East China Normal University) · Jingqun Tang (Bytedance) · Chunhui Lin (Bytedance) · Binghong Wu (Bytedance) · Can Huang (Bytedance) · Hao Liu (Bytedance Inc.) · Xin Tan (East China Normal University) · Zhizhong Zhang (East China Normal University) · Yuan Xie (East China Normal University)
NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation
Vikas Thamizharasan (University of Massachusetts Amherst) · Difan Liu (Adobe Research) · Matthew Fisher (Adobe Research) · Nanxuan Zhao (Adobe Research) · Evangelos Kalogerakis (UMass Amherst) · Michal Lukáč (Adobe Systems)
Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain
Qunliang Xing (Beihang University) · Mai Xu (Beihang University, Tsinghua University) · Shengxi Li (Beihang University) · Xin Deng (Beijing University of Aeronautics and Astronautics) · Meisong Zheng (Alibaba Group) · huaida liu (Alibaba Group) · Ying Chen (Alibaba Group)
Accurate Training Data for Occupancy Map Prediction in Automated Driving using Evidence Theory
Jonas Kälble (Bosch Center for Artificial Intelligence) · Sascha Wirges (Robert Bosch GmbH, Bosch) · Maxim Tatarchenko (Bosch) · Eddy Ilg (None)
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Chao Xu (Zhejiang University) · Yang Liu (Alibaba Group) · Jiazheng Xing (Zhejiang University) · Weida Wang (Xingji Meizu Group) · Mingze Sun (None) · Jun Dan (Zhejiang University) · Tianxin Huang (Tencent youtu lab) · Siyuan Li (Westlake University, Zhejiang University) · Zhi-Qi Cheng (Carnegie Mellon University) · Ying Tai (Nanjing University) · Baigui Sun (Alibaba Group)
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Shuai Yang (Nanyang Technological University) · Yifan Zhou (Nanyang Technological University) · Ziwei Liu (Nanyang Technological University) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY)
Backdoor Defense via Test-Time Detecting and Repairing
Jiyang Guan (Institute of Automation, Chinese Academy of Sciences) · Jian Liang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Ran He (None)
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
Zhixuan Liang (The University of Hong Kong) · Yao Mu (The University of Hong Kong) · Hengbo Ma (None) · Masayoshi Tomizuka (University of California, Berkeley) · Mingyu Ding (UC Berkeley) · Ping Luo (The University of Hong Kong)
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia (Baidu) · Xinliang Wang (Baidu) · Feng Lv (Baidu) · Xin Hao (Beijing Institute of Technology) · Yifeng Shi (Baidu)
ZONE: Zero-Shot Instruction-Guided Local Editing
Shanglin Li (Beijing University of Aeronautics and Astronautics) · Bohan Zeng (Beijing University of Aeronautics and Astronautics) · Yutang Feng (Beijing University of Aeronautics and Astronautics) · Sicheng Gao (Bayerische Julius-Maximilians-Universität Würzburg) · Xuhui Liu (Beihang University) · Jiaming Liu (Xiaohongshu) · Li Lin (Xiamen University) · Xu Tang (Shanghaitech University) · Yao Hu (Zhejiang University, Tsinghua University) · Jianzhuang Liu (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Baochang Zhang (Beihang University)
Learning to Count without Annotations
Lukas Knobel (University of Amsterdam & TNO) · Tengda Han (University of Oxford, University of Oxford) · Yuki Asano (University of Amsterdam)
HIVE: Harnessing Human Feedback for Instructional Visual Editing
Shu Zhang (SalesForce.com) · Xinyi Yang (Salesforce Research) · Yihao Feng (Salesforce Research) · Can Qin (Northeastern University) · Chia-Chih Chen (Salesforce) · Ning Yu (Salesforce Research) · Zeyuan Chen (SalesForce.com) · Huan Wang (SalesForce.com) · Silvio Savarese (Salesforce) · Stefano Ermon (Stanford University) · Caiming Xiong (Salesforce Research) · Ran Xu (SalesForce.com)
Towards Backward-Compatible Continual Learning of Image Compression
Zhihao Duan (Purdue University) · Ming Lu (Nanjing University) · Justin Yang (Purdue University) · Jiangpeng He (Purdue University) · Zhan Ma (Nanjing University) · Fengqing Zhu (Purdue University, Purdue University)
Clustering for Protein Representation Learning
Ruijie Quan (Zhejiang University) · Wenguan Wang (Zhejiang University) · Fan Ma (None) · Hehe Fan (None) · Yi Yang (Zhejiang University)
Learning to Segment Referred Objects from Narrated Egocentric Videos
Yuhan Shen (Northeastern University) · Huiyu Wang (Facebook) · Xitong Yang (Meta) · Matt Feiszli (Meta AI) · Ehsan Elhamifar (None) · Lorenzo Torresani (Facebook) · Effrosyni Mavroudi ()
What Sketch Explainability Really Means for Downstream Tasks ?
Hmrishav Bandyopadhyay (University of Surrey) · Pinaki Nath Chowdhury (University of Surrey) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Aneeshan Sain (University of Surrey) · Tao Xiang (University of Surrey) · Yi-Zhe Song (None)
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran (National University of Singapore) · Xiaodong Cun (Tencent AI Lab) · Jia-Wei Liu (National University of Singapore) · Rui Zhao (None) · Song Zijie (Fudan University) · Xintao Wang (Tencent) · Jussi Keppo (National University of Singapore) · Mike Zheng Shou (National University of Singapore)
SURE: SUrvey REcipes for building reliable and robust deep networks
Yuting Li (China Three Gorges University) · Yingyi Chen (Department of Electrical Engineering, KU Leuven, Belgium, KU Leuven) · Xuanlong Yu (Université Paris-Saclay) · Dexiong Chen (Max Planck Institute of Biochemistry) · Xi Shen (Tencent AI Lab)
Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Tianci Bi (Xi'an Jiaotong University) · Xiaoyi Zhang (Research, Microsoft) · Zhizheng Zhang (Microsoft Research) · Wenxuan Xie (Microsoft Research Asia) · Cuiling Lan (Microsoft) · Yan Lu (Microsoft Research Asia) · Nanning Zheng (Xi'an Jiaotong University)
Label Propagation for Zero-shot Classification with Vision-Language Models
Vladan Stojnić (Czech Technical University in Prague) · Yannis Kalantidis (NAVER LABS Europe) · Giorgos Tolias (CTU in Prague)
3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
Songchun Zhang (Zhejiang University) · Yibo Zhang (Jilin University) · Quan Zheng (Institute of Software, Chinese Academy of Sciences) · Rui Ma (Jilin University) · Wei Hua (Zhejiang Lab) · Hujun Bao (Zhejiang University) · Weiwei Xu (Zhejiang University) · Changqing Zou (Zhejiang University)
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou (Huazhong University of Science and Technology) · Shiming Chen (Carnegie Mellon University) · Shuhuang Chen (Huazhong University of Science and Technology) · Ziming Hong (The University of Sydney) · Yan Wang (Alibaba Group) · Xuetao Feng (Alibaba Group) · Salman Khan (Mohamed bin Zayed University of Artificial Intelligence) · Fahad Shahbaz Khan (MBZUAI; Linköping University) · Xinge You (Huazhong University of Science and Technology)
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
Zhipeng Du (University of Edinburgh & King's College London) · Miaojing Shi (King's College London) · Jiankang Deng (Imperial College London & Huawei UKRD)
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei (Johns Hopkins University) · Mauricio Delbracio (None) · Hossein Talebi (Google Research) · Zhengzhong Tu (University of Texas at Austin) · Vishal M. Patel (Johns Hopkins University) · Peyman Milanfar (Peyman Milanfar)
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
Haoyu Ma (University of California, Irvine) · Shahin Mahdizadehaghdam (Meta) · Bichen Wu (Facebook) · Zhipeng Fan (Facebook) · Yuchao Gu (None) · Wenliang Zhao (Meta Inc) · Lior Shapira (Meta) · Xiaohui Xie (University of California, Irvine)
Unsegment Anything by Simulating Deformation
Jiahao Lu (National University of Singapore) · Xingyi Yang (National University of Singapore) · Xinchao Wang (National University of Singapore)
$\mathcal{Z}^*$: Zero-shot $\underline{S}$tyle $\underline{T}$ransfer via $\underline{A}$ttention $\underline{R}$eweighting
Yingying Deng (None) · Xiangyu He (Meituan) · Fan Tang (Institute of Computing Technology, CAS) · Weiming Dong (Institute of Automation, Chinese Academy of Sciences)
Cross-spectral Gated-RGB Stereo Depth Estimation
Samuel Brucker (Torc Robotics) · Stefanie Walz (Mercedes-Benz AG) · Mario Bijelic (Princeton University) · Felix Heide (Department of Computer Science, Princeton University)
Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
Qingping Zheng (Northwestern Polytechnical University) · Ling Zheng (Tsinghua-Fuzhou Institute for Data Technology) · Yuanfan Guo (Huawei Technologies Ltd.) · Ying Li (Northwestern Polytechnical University) · Songcen Xu (Huawei Noah's Ark Lab) · Jiankang Deng (Imperial College London & Huawei UKRD) · Hang Xu (Huawei Noah‘s Ark Lab)
Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment
Jiyuan Zhang (Peking University) · Shiyan Chen (Peking University) · Yajing Zheng (Peking University) · Zhaofei Yu (Peking University) · Tiejun Huang (Peking University)
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang (Nankai University) · Peng-Tao Jiang (vivo Mobile Communication (Hangzhou) Co., Ltd.) · Qibin Hou (Nankai University) · Hao Zhang (vivo Mobile Communication (Hangzhou)Co., Ltd) · Jinwei Chen (vivo Mobile Communication Co., Ltd.) · Bo Li (vivo Mobile Communication Co.,Ltd.)
LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection
Dat NGUYEN (University of Luxembourg) · Nesryne Mejri (SnT, University of Luxembourg) · Inder Pal Singh (University of Luxemburg) · Polina Kuleshova (University of Luxemburg) · Marcella Astrid (University of Luxemburg) · Anis Kacem (University of Luxemburg) · Enjie Ghorbel (CRISTAL laboratory, ENSI, University of Manouba) · Djamila Aouada (SnT, University of Luxembourg)
EASE-DETR: Easing the Competition among Object Queries
Yulu Gao (Beijing University of Aeronautics and Astronautics) · Yifan Sun (Baidu Research) · Xudong Ding (Beijing University of Aeronautics and Astronautics) · Chuyang Zhao (Baidu) · Si Liu (Beihang University)
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
Jianhao Zeng (Tianjin University) · Dan Song (Tianjin University) · Weizhi Nie (Tianjin University) · Hongshuo Tian (Tianjin University) · Tongtong Wang (Tencent LightSpeed Studio) · An-An Liu (Tianjin University)
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang (The Chinese University of Hong Kong) · Xiaohan Ding (Tencent AI Lab) · Kaixiong Gong (None) · Yixiao Ge (Tencent) · Ying Shan (Tencent) · Xiangyu Yue (The Chinese University of Hong Kong)
Transcriptomics-guided Slide Representation Learning in Computational Pathology
Guillaume Jaume (Harvard University) · Lukas Oldenburg (Harvard University) · Anurag Vaidya (Massachusetts Institute of Technology) · Richard J. Chen (Harvard University) · Drew F. K. Williamson (Massachusetts General Hospital, Harvard University) · Thomas Peeters (Harvard University) · Andrew Song (Brigham and Women's hospital) · Faisal Mahmood (Harvard University)
ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis
Xiangjun Gao ((HKUST) The Hong Kong University of Science and Technology) · Xiaoyu Li (Tencent AI Lab) · Chaopeng Zhang (Tencent AI Lab) · Qi Zhang (Tencent AI Lab) · Yan-Pei Cao (Tencent ARC Lab) · Ying Shan (Tencent) · Long Quan (The Hong Kong University of Science and Technology)
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng (University of Science and Technology of China) · Binxin Yang (University of Science and Technology of China) · Tiankai Hang (Southeast University) · Chen Li (Xi'an Jiaotong University) · Shuyang Gu (Research, Microsoft) · Ting Zhang (Beijing Normal University) · Jianmin Bao (Microsoft) · Zheng Zhang (Microsoft) · Houqiang Li (University of Science and Technology of China) · Han Hu (Microsft Research Asia) · Dong Chen (Microsoft) · Baining Guo (Microsoft Research)
Making Visual Sense of Oracle Bones for You and Me
Runqi Qiao (Beijing University of Posts and Telecommunications) · LAN YANG (Beijing University of Posts and Telecommunications) · Kaiyue Pang (SketchX AI) · Honggang Zhang (Beijing University of Posts and Telecommunications)
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
Tz-Ying Wu (University of California, San Diego) · Chih-Hui Ho (University of California San Diego) · Nuno Vasconcelos (University of California San Diego)
Mosaic-SDF for 3D Generative Models
Lior Yariv (Weizmann Institute of Science) · Omri Puny (Weizmann Institute of Science) · Oran Gafni (Meta AI) · Yaron Lipman (Facebook)
Characteristics Matching Based Hash Codes Generation for Efficient Fine-grained Image Retrieval
Zhen-Duo Chen (Shandong University) · Li-Jun Zhao (Shandong University) · Zi-Chao Zhang (Shandong University) · Xin Luo (Shandong University) · Xin-Shun Xu (Shandong University)
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
Bingyan Liu (South China University of Technology) · Chengyu Wang (Alibaba Group) · Tingfeng Cao (South China University of Technology) · Kui Jia (South China University of Technology) · Jun Huang (University of Science and Technology of China)
DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video
Huiqiang Sun (None) · Xingyi Li (Huazhong University of Science and Technology) · Liao Shen (Huazhong University of Science and Technology) · Xinyi Ye (School of Artificial Intelligence and Automation, Huazhong University of Science and Technology) · Ke Xian (Nanyang Technological University) · Zhiguo Cao ()
Binarized Low-light Raw Video Enhancement
Gengchen Zhang (Beijing Institute of Technology) · Yulun Zhang (Shanghai Jiao Tong University) · Xin Yuan (Westlake University) · Ying Fu (None)
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Tianyu Yu (Tsinghua University, Tsinghua University) · Yuan Yao (Tsinghua University) · Haoye Zhang (Tsinghua University, Tsinghua University) · Taiwen He (Tsinghua University, Tsinghua University) · Yifeng Han (Zhejiang University) · Ganqu Cui (Tsinghua University, Tsinghua University) · Jinyi Hu (Tsinghua University, Tsinghua University) · Zhiyuan Liu (Tsinghua University) · Hai-Tao Zheng (Tsinghua University, Tsinghua University) · Maosong Sun (Tsinghua University, Tsinghua University)
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Jack Urbanek (Facebook) · Florian Bordes (Meta AI) · Pietro Astolfi (Meta AI) · Mary Williamson (Meta AI (FAIR)) · Vasu Sharma (Meta AI/ CMU) · Adriana Romero-Soriano (Meta)
GaussianAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh
Jing Wen (University of Illinois Urbana-Champaign) · Xiaoming Zhao (UIUC) · Jason Ren (Apple) · Alexander G. Schwing (UIUC) · Shenlong Wang (University of Illinois, Urbana Champaign)
Coherent Temporal Synthesis for Incremental Action Segmentation
GUODONG DING (NATIONAL UNIVERSITY OF SINGAPORE) · Hans Golong (National University of Singapore) · Angela Yao (National University of Singapore)
Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior
Zhenyu Chen (Nanjing University) · Jie Guo (Nanjing University) · Shuichang Lai (Nanjing University) · Ruoyu Fu (nanjing university) · mengxun kong (None) · Chen Wang (Nanjing University) · Hongyu Sun (Guangdong Oppo Mobile Telecommunications Corp., Ltd) · Zhebin Zhang (OPPO) · Chen Li (Innopeak Technology Inc.) · Yanwen Guo (Nanjing University)
Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding
Sai Wang (Wuhan University) · Yutian Lin (Wuhan University) · Yu Wu (Wuhan University)
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures
Zhangyang Xiong () · Chenghong Li (The Chinese University of Hong Kong, Shenzhen) · Kenkun Liu (The Chinese University of Hong Kong (Shenzhen)) · Hongjie Liao (Chinese University of Hong Kong, Shenzhen) · Jianqiao HU (The Chinese University of Hong Kong, Shenzhen) · Junyi Zhu (The Chinese University of Hongkong, Shenzhen) · Shuliang Ning (The Chinese University of HongKong, ShenZhen) · Lingteng Qiu (None) · Chongjie Wang (The Chinese University of Hong Kong ,Shenzhen) · Shijie Wang (The Chinese University of Hong Kong, Shenzhen) · Shuguang Cui (The Chinese University of Hong Kong, Shenzhen) · Xiaoguang Han (The Chinese University of Hong Kong, Shenzhen)
ES$^3$: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
Yuanhang Zhang (Institute of Computing Technology, Chinese Academy of Sciences) · Shuang Yang (Institute of Computing Technology, Chinese Academy of Sciences) · Shiguang Shan (Institute of Computing Technology, Chinese Academy of Sciences) · Xilin Chen (None)
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
Weihuang Liu (University of Macau) · Xi Shen (Tencent AI Lab) · Haolun Li (University of Macau) · Xiuli Bi (Chongqing University of Posts and Telecommunications) · Bo Liu (Chongqing University of Posts and Telecommunications) · Chi-Man Pun (University of Macau) · Xiaodong Cun (Tencent AI Lab)
KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation
Ruida Zhang (Department of Automation, Tsinghua University, Tsinghua University) · Chenyangguang Zhang (Tsinghua University) · Yan Di (Technische Universität München) · Fabian Manhardt (Google) · Xingyu Liu (Tsinghua University, Tsinghua University) · Federico Tombari (Google, TUM) · Xiangyang Ji (Tsinghua University)
Communication-Efficient Federated Learning with Accelerated Client Gradient
Geeho Kim (Seoul National University) · Jinkyu Kim (Seoul National University) · Bohyung Han (Seoul National University)
Taming Stable Diffusion for Text to 360$^{\circ}$ Panorama Image Generation
Cheng Zhang (None) · Qianyi Wu (Monash University) · Camilo Cruz Gambardella (Monash University) · Xiaoshui Huang (Shanghai AI Laboratory) · Dinh Phung (Monash University) · Wanli Ouyang (University of Sydney) · Jianfei Cai (Monash University)
Real-time 3D-aware Portrait Video Relighting
Ziqi Cai (Chinese Academy of Sciences & Beijing Jiaotong University) · Kaiwen Jiang (None) · Shu-Yu Chen (Chinese Academy of Sciences) · Yu-Kun Lai (Cardiff University) · Hongbo Fu (City University of Hong Kong) · Boxin Shi (Peking University) · Lin Gao (University of Chinese Academy of Sciences)
Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
Huajian Huang (The Hong Kong University of Science and Technology) · Longwei Li (SUN YAT-SEN UNIVERSITY) · Hui Cheng (SUN YAT-SEN UNIVERSITY) · Sai-Kit Yeung (The Hong Kong University of Science and Technology (HKUST))
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang (East China University of Science and Technology) · Mengping Yang (East China University of Science and Technology) · Qin Zhou (East China University of Science and Technology) · Zhe Wang (East China University of Science and Technology)
HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
WENCAN CHENG (None) · Hao Tang (ETH Zurich and CMU) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Jong Hwan Ko (Sungkyunkwan University (SKKU))
NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
Chenhao Li (Osaka University) · Taishi Ono (Sony Semiconductor Solutions Europe) · Takeshi Uemori (Sony Semiconductor Solutions Corporation) · Hajime Mihara (Sony Semiconductor Solutions Corporation) · Alexander Gatto (Sony Semiconductor Solutions Europe) · Hajime Nagahara (Osaka University) · Yusuke Moriuchi (Sony Semiconductor Solutions Corporation)
MaskPLAN: Masked Generative Layout Planning from Partial Input
Hang Zhang (ETHZ - ETH Zurich) · Anton Savov (ETHZ - ETH Zurich) · Benjamin Dillenburger (ETHZ - ETH Zurich)
A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation
Qucheng Peng (University of Central Florida) · Ce Zheng (University of Central Florida) · Chen Chen ()
Rapid 3D Model Generation with Intuitive 3D Input
Tianrun Chen (Zhejiang University) · Chaotao Ding (Huzhou university) · Shangzhan Zhang (Zhejiang University) · Chunan Yu (Huzhou University) · Ying Zang (Huzhou University) · Zejian Li (Zhejiang University) · Sida Peng (None) · Lingyun Sun (Zhejiang University)
MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception
Thien-Minh Nguyen (Nanyang Technological University) · Shenghai Yuan (National Technological University) · Thien Nguyen (Nanyang Technological University) · Pengyu Yin (Nanyang Technological University) · Haozhi Cao (Nanyang Technological University) · Lihua Xie (Nanyang Technological University) · Maciej Wozniak (KTH Royal Institute of Technology) · Patric Jensfelt (KTH Royal Institute of Technology, Stockholm, Sweden) · Marko Thiel (Hamburg University of Technology) · Justin Ziegenbein (Hamburg University of Technology) · Noel Blunder (Institute for Technical Logistics - Hamburg University of Technology)
Open-vocabulary object 6D pose estimation
Jaime Corsetti (Fondazione Bruno Kessler & University of Trento) · Davide Boscaini (Fondazione Bruno Kessler) · Changjae Oh (Queen Mary University London) · Andrea Cavallaro (EPFL - EPF Lausanne) · Fabio Poiesi (Fondazione Bruno Kessler)
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Stanislaw Szymanowicz (University of Oxford, University of Oxford) · Christian Rupprecht (University of Oxford) · Andrea Vedaldi (University of Oxford)
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
Yuhao Liu (City University of Hong Kong) · Zhanghan Ke (City University of Hong Kong) · Fang Liu (City University of Hong Kong) · Nanxuan Zhao (Adobe Research) · Rynson W.H. Lau (City University of Hong Kong)
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu (None) · Guandao Yang (None) · Zhibing Li (The Chinese University of Hong Kong) · Kai Zhang (Adobe Systems) · Ziwei Liu (Nanyang Technological University) · Leonidas Guibas (Stanford University) · Dahua Lin (The Chinese University of Hong Kong) · Gordon Wetzstein (Stanford University)
$CrowdDiff$: Multi-hypothesis Crowd Density Estimation using Diffusion Models
Yasiru Ranasinghe (Johns Hopkins University) · Nithin Gopalakrishnan Nair (Johns Hopkins University) · Wele Gedara Chaminda Bandara (Johns Hopkins University) · Vishal M. Patel (Johns Hopkins University)
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
Yurui Qian (University of Science and Technology of China) · Qi Cai (JD) · Yingwei Pan (HiDream.ai) · Yehao Li (HiDream.ai) · Ting Yao (JD AI Research) · Qibin Sun (University of Science and Technology of China) · Tao Mei (JD Explore Academy)
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
Qing Yu (LY Corporation) · Mikihiro Tanaka (LY Corporation) · Kent Fujiwara (LY Corporation)
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
Minh-Tuan Tran (Monash University) · Trung Le (Monash University) · Xuan-May Le (University of Melbourne) · Mehrtash Harandi (Monash University) · Dinh Phung (Monash University)
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
Zixiong Huang (South China University of Technology) · Qi Chen (The University of Adelaide) · Libo Sun (University of Adelaide) · Yifan Yang (South China University of Technology) · Naizhou Wang (CVTE research) · Qi Wu (University of Adelaide) · Mingkui Tan (South China University of Technology)
Unsupervised Salient Instance Detection
Xin Tian (Huawei Technologies Ltd.) · Ke Xu (City University of Hong Kong) · Rynson W.H. Lau (City University of Hong Kong)
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
Kim Youwang (Pohang University of Science and Technology) · Tae-Hyun Oh (None) · Gerard Pons-Moll (University of Tübingen)
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang (University of Sydney) · Gil Avraham (Amazon) · Yan Zuo (Amazon) · Sameera Ramasinghe (Amazon) · Loris Bazzani (Amazon) · Anton van den Hengel (University of Adelaide)
L-MAGIC: Language Model Assisted Generation of Images with Consistency
zhipeng cai (Intel Labs) · Matthias Mueller (None) · Reiner Birkl (Intel Corporation) · Diana Wofk (Intel) · Shao-Yen Tseng (Intel) · JunDa Cheng (Huazhong University of Science and Technology) · Gabriela Ben Melech Stan (Intel) · Vasudev Lal (None) · Michael Paulitsch (Intel)
MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
Ziyang Chen (Guizhou University) · Wei Long (Guizhou University) · He Yao (Guizhou University) · Yongjun Zhang (Guizhou University) · Bingshu Wang (Northwest Polytechnical University Xi'an) · Yongbin Qin (Guizhou University) · Jia Wu (Monash University)
Discovering Syntactic Interaction Clues for Human-Object Interaction Detection
Jinguo Luo () · Weihong Ren (Harbin Institute of Technology, Shenzhen) · Weibo Jiang (Harbin Institute of Technology) · Xi'ai Chen (Shenyang Institute of Automation, Chinese Academy of Sciences) · Qiang Wang (Shenyang University) · Zhi Han (Shenyang Institute of Automation, Chinese Academy of Sciences) · Honghai LIU (Harbin Institute of Technology, Shenzhen)
GLACE: Global Local Accelerated Coordinate Encoding
Fangjinhua Wang (None) · Xudong Jiang (ETHZ - ETH Zurich) · Silvano Galliani (Microsoft) · Christoph Vogel (Microsoft) · Marc Pollefeys (ETH Zurich / Microsoft)
Active Prompt Learning in Vision Language Models
Jihwan Bang (KAIST) · Sumyeong Ahn (Michigan State University) · Jae-Gil Lee (Korea Advanced Institute of Science and Technology)
HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios
HyunJun Jung (Technische Universität München) · Shun-Cheng Wu (Technical University Munich) · Patrick Ruhkamp (Technical University Munich) · Guangyao Zhai (Technical University of Munich) · Hannah Schieber (Technische Universität München/Friedrich-Alexander Universität Erlangen-Nürnberg) · Giulia Rizzoli (University of Padua) · Pengyuan Wang (Technische Universität München) · Hongcheng Zhao (Technische Universität München) · Lorenzo Garattoni (Toyota Motor Europe) · Sven Meier (Toyota Motor Europe NV/SA) · Daniel Roth (Technische Universität München) · Nassir Navab (TU Munich) · Benjamin Busam (Technical University of Munich)
FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions
Zhen Liu (Nanjing University) · Hao Zhu (Nanjing University) · Qi Zhang (Tencent AI Lab) · Jingde Fu (Nanjing University) · Weibing Deng (nanjing university) · Zhan Ma (Nanjing University) · Yanwen Guo (Nanjing University) · Xun Cao (Nanjing University)
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Haoran Lai (University of Science and Technology of China) · Qingsong Yao (University of the Chinese Academy of Sciences) · Zihang Jiang (University of Science and Technology of China) · Rongsheng Wang (University of Science and Technology of China) · Zhiyang He (Xunfei Healthcare Technology Co., Ltd.) · Xiaodong Tao (Xunfei Healthcare Co. Ltd) · S Kevin Zhou (University of Science and Technology of China)
Customization Assistant for Text-to-image Generation
Yufan Zhou (State University of New York, Buffalo) · Ruiyi Zhang (Adobe Research) · Jiuxiang Gu (Adobe Systems) · Tong Sun (Adobe Systems)
UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures
Mingyuan Zhou (Innopeak Technology) · Rakib Hyder (Oppo, Seattle, USA) · Ziwei Xuan (Innopeak Technology) · Guo-Jun Qi (University of Central Florida)
Event-based Structure-from-Orbit
Ethan Elms (University of Adelaide) · Yasir Latif (The University of Adelaide) · Tae Ha Park (Stanford University) · Tat-Jun Chin (None)
From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior
Jaeho Moon (KAIST) · Juan Luis Gonzalez Bello (KAIST) · Byeongjun Kwon (KAIST) · Munchurl Kim (Korea Advanced Institute of Science and Technology)
Deep-TROJ: An Inference Stage Trojan Insertion Algorithm through Efficient Weight Replacement Attack
Sabbir Ahmed (State University of New York at Binghamton) · RANYANG ZHOU (New Jersey Institute of Technology) · Shaahin Angizi (New Jersey Institute of Technology) · Adnan Rakin Rakin (None)
Dynamic LiDAR Re-simulation using Compositional Neural Fields
Hanfeng Wu (None) · Xingxing Zuo (Caltech) · Stefan Leutenegger (Department of Informatics, Technische Universität München) · Or Litany (NVIDIA / Technion) · Konrad Schindler (ETH Zurich) · Shengyu Huang (None)
Unsupervised Blind Image Deblurring Based on Self-Enhancement
Lufei Chen (Sichuan University) · Xiangpeng Tian (SiChuan University) · Shuhua Xiong (Sichuan University) · Yinjie Lei (Sichuan University) · Chao Ren (Sichuan University)
ProMotion: Prototypes As Motion Learners
Yawen Lu (Purdue University) · Dongfang Liu (Rochester Institute of Technology) · Qifan Wang (Meta AI) · Cheng Han (Rochester Institute of Technology) · Yiming Cui (University of Florida) · Zhiwen Cao (Purdue University) · Xueling Zhang (Rochester Institute of Technology) · Yingjie Victor Chen (Purdue University) · Heng Fan (University of North Texas)
Zero-Shot Structure-Preserving Diffusion Model for High Dynamic Range Tone Mapping
Ruoxi Zhu (Fudan University) · Shusong Xu (Alibaba Group) · Peiye Liu (Alibaba Group) · Sicheng Li (Alibaba Group) · Yanheng Lu (Alibaba Group) · Dimin Niu (Alibaba Group) · Zihao Liu (Alibaba Group) · Zihao Meng (Alibaba Group) · Li Zhiyong (Alibaba Group) · Xinhua Chen (Fudan University) · Yibo Fan (Fudan University)
Mask Grounding for Referring Image Segmentation
Yong Xien Chng (None) · Henry Zheng (Tsinghua University) · Yizeng Han (Tsinghua University, Tsinghua University) · Xuchong QIU (Bosch) · Gao Huang (Tsinghua University, Tsinghua University)
SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields
Quentin HERAU (Huawei/University of Burgundy) · Nathan Piasco (Huawei Technologies Ltd.) · Moussab Bennehar (Huawei Noah's Ark Lab) · Luis Guiller,o Roldao Jimenez (Huawei Technologies Ltd.) · Dzmitry Tsishkou (Huawei Technologies Ltd.) · MigniotCyrille (University of Burgundy) · Modélisation Information Systèmes (Université de Picardie Jules-Verne) · Cedric Demonceaux (Université de Bourgogne)
SemCity: Semantic Scene Generation with Triplane Diffusion
Jumin Lee (Korea Advanced Institute of Science and Technology) · Sebin Lee (Korea Advanced Institute of Science and Technology (KAIST)) · Changho Jo (Neosapience) · Woobin Im (Korea Advanced Institute of Science and Technology) · Ju-hyeong Seon (Korea Advanced Institute of Science & Technology) · Sung-Eui Yoon (KAIST)
$V_kD:$ Improving knowledge distillation using orthogonal projections
Roy Miles (Imperial College London) · Ismail Elezi (Huawei Noah's Ark) · Jiankang Deng (Imperial College London & Huawei UKRD)
Label-Efficient Group Robustness via Out-of-Distribution Concept Curation
Yiwei Yang (University of Washington) · Anthony Liu (University of Michigan) · Robert Wolfe (University of Washington) · Aylin Caliskan (University of Washington) · Bill Howe (University of Washington)
StyLitGAN: Image-based Relighting via Latent Control
Anand Bhattad (None) · James Soole (University of Illinois Urbana-Champaign) · David Forsyth (University of Illinois at Urbana-Champaign)
Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability
Jaehui Hwang (Yonsei University) · Junghyuk Lee (Yonsei University) · Jong-Seok Lee (Yonsei University)
ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization
Weiyao Wang (Facebook) · Pierre Gleize (Polytech Nice Sophia) · Hao Tang (Meta Platforms) · Xingyu Chen (Facebook) · Kevin Liang (FAIR at Meta) · Matt Feiszli (Meta AI)
FreeDrag: Feature Dragging for Reliable Point-based Image Editing
Pengyang Ling (University of Science and Technology of China) · Lin Chen (University of Science and Technology of China) · Pan Zhang (Shanghai Artificial Intelligence Laboratory) · Huaian Chen (University of Science and Technology of China) · Yi Jin (University of Science and Technology of China) · Jinjin Zheng (University of Science and Technology of China)
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon (Yonsei University) · Dohyung Kim (Yonsei University) · Jun Yong Cheon (Yonsei University) · Bumsub Ham (Yonsei University)
NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis
Nilesh Kulkarni (None) · Davis Rempe (NVIDIA) · Kyle Genova (Google) · Abhijit Kundu (Google) · Justin Johnson (University of Michigan) · David Fouhey (New York University) · Leonidas Guibas (Stanford University)
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Tianyi Xie (University of California, Los Angeles) · Zeshun Zong (University of California, Los Angeles) · Yuxing Qiu (UCLA & LightSpeed Studios) · Xuan Li (None) · Yutao Feng (Zhejiang University) · Yin Yang (University of Utah) · Chenfanfu Jiang (University of California, Los Angeles)
Viewpoint-Aware Visual Grounding in 3D Scenes
Xiangxi Shi (Oregon State University) · Zhonghua Wu (SenseTime) · Stefan Lee (Oregon State University)
Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
Feilong Tang (Monash University) · Zhongxing Xu (Weill Cornell Medicine, Cornell University) · Zhaojun QU (Xi'an Jiaotong-Liverpool University) · Wei Feng (Monash University) · xingjian jiang (University of Michigan - Ann Arbor) · Zongyuan Ge (Monash University)
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kang chenkang (Huaihai Institute of Technology) · Xiangqian Wu (Harbin Institute of Technology)
Space-time Diffusion Features for Zero-shot Text-driven Motion Transfer
Rafail Fridman (Weizmann Institute of Science) · Danah Yatim (Weizmann Institute of Science) · Omer Bar-Tal (Weizmann Institute of Science) · Yoni Kasten (NVIDIA Research) · Tali Dekel (Weizmann Institute of Science)
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Kun Yuan (Kuaishou Technology) · Hongbo Liu (Tsinghua University) · Mading Li (Kuaishou Technology) · Muyi Sun (Institute of automation, Chinese Academy of Sciences) · Ming Sun (Kuaishou Tech) · Jiachao Gong (Beijing Kuaishou ) · Jinhua Hao (Kuaishou Tech) · Chao Zhou (Peking University) · Yansong Tang (Tsinghua University)
Batch Normalization Alleviates the Spectral Bias in Coordinate Networks
Zhicheng Cai (Nanjing University) · Hao Zhu (Nanjing University) · Qiu Shen (Nanjing University) · Xinran Wang (Nanjing University) · Xun Cao (Nanjing University)
CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras
Sachin Shah (University of Maryland, College Park) · Matthew Chan (Department of Computer Science, University of Maryland, College Park) · Haoming Cai (University of Maryland, College Park) · Jingxi Chen (University of Maryland College Park) · Sakshum Kulshrestha (University of Maryland, College Park) · Chahat Deep Singh (University of Maryland, College Park) · Yiannis Aloimonos (University of Maryland, College Park) · Christopher Metzler (University of Maryland, College Park)
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
Zhe Li (Tsinghua University) · Zerong Zheng (Tsinghua University) · Lizhen Wang (Tsinghua University, Tsinghua University) · Yebin Liu (Tsinghua University)
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Dave Chen (Technische Universität München) · Haoxuan Li (Technische Universität München) · Hsin-Ying Lee (Snap Inc.) · Sergey Tulyakov (Snap Inc.) · Matthias Nießner (Technical University of Munich)
OMG-Seg: Is One Model Good Enough For All Segmentation?
Xiangtai Li (Nanyang Technological University) · Haobo Yuan (Nanyang Technological University) · Wei Li (Nanyang Technological University) · Henghui Ding (Fudan University) · Size Wu (Nanyang Technological University) · Wenwei Zhang (None) · Yining Li (Shanghai AI Laboratory) · Kai Chen (Shanghai AI Laboratory) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY)
Pose Adapted Shape Learning for Large-Pose Face Reenactment
Gee-Sern Hsu (None) · Jie-Ying Zhang (National Taiwan University of Science and Technology) · Yu-Hsiang Huang (National Taiwan University of Science and Technology) · Wei-Jie Hong (National Taiwan University of Science and Technology)
Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models
Gianni Franchi (ENSTA Paris) · Olivier Laurent (Université Paris-Saclay) · Maxence Leguéry (ENSTA Paris) · Andrei Bursuc (valeo.ai) · Andrea Pilzer (NVIDIA) · Angela Yao (National University of Singapore)
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Luca Barsellotti (University of Modena and Reggio Emilia) · Roberto Amoroso (University of Modena and Reggio Emilia) · Marcella Cornia (University of Modena and Reggio Emilia) · Lorenzo Baraldi (Università degli Studi di Modena e Reggio Emilia) · Rita Cucchiara (Università di Modena e Reggio Emilia)
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
Haiyang Liu (the university of tokyo) · Zihao Zhu (Keio University) · Giorgio Becherini (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · YICHEN PENG (Japan Advanced Institute of Science and Technology, Tokyo Institute of Technology) · Mingyang Su (Tsinghua University, Tsinghua University) · YOU ZHOU (Huawei Technologies Ltd.) · Xuefei Zhe (City University of Hong Kong) · Naoya Iwamoto (Huawei Technologies Japan K.K.) · Bo Zheng (Huawei Technologies Japan) · Michael J. Black (University of Tübingen)
Explaining CLIP's performance disparities on data from blind/low vision users
Daniela Massiceti (Microsoft Research) · Camilla Longden (Microsoft Research, Cambridge) · Agnieszka Słowik (Microsoft) · Samuel Wills (World Bank) · Martin Grayson (Research, Microsoft) · Cecily Morrison (Microsoft Research)
MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models
Yanting Wang (Pennsylvania State University) · Hongye Fu (Zhejiang University) · Wei Zou (Pennsylvania State University) · Jinyuan Jia (Pennsylvania State University)
NB-GTR: Narrow-Band Guided Turbulence Removal
Yifei Xia (Peking University) · Chu Zhou (Peking University) · Chengxuan Zhu (Peking University) · Minggui Teng (Peking University) · Chao Xu (Peking University) · Boxin Shi (Peking University)
Mudslide: A Universal Nuclear Instance Segmentation Method
Jun Wang (Peking University)
LaneCPP: Continuous 3D Lane Detection using Physical Priors
Maximilian Pittner (Bosch) · Joel Janai (Robert Bosch GmbH, Bosch) · Alexandru Paul Condurache (None)
Large Language Models are Good Prompt Learners for Low-Shot Image Classification
Zhaoheng Zheng (University of Southern California) · Jingmin Wei (University of Southern California) · Xuefeng Hu (University of Southern California) · Haidong Zhu (University of Southern California) · Ram Nevatia (None)
PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
Yufei Zhang (None) · Jeffrey Kephart (IBM, International Business Machines) · Zijun Cui (University of Southern California) · Qiang Ji (Rensselaer Polytechnic Institute)
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment
yiming ren (None) · xiao han (ShanghaiTech University) · Chengfeng Zhao (ShanghaiTech University) · Jingya Wang (ShanghaiTech University) · Lan Xu (ShanghaiTech University) · Jingyi Yu (Shanghai Tech University) · Yuexin Ma (ShanghaiTech University)
G3DR: Generative 3D Reconstruction in ImageNet
Pradyumna Reddy () · Ismail Elezi (Huawei Noah's Ark) · Jiankang Deng (Imperial College London & Huawei UKRD)
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu (University of Washington) · Otilia Stretcu (Google Research) · Chun-Ta Lu (Google Research) · Krishnamurthy Viswanathan (Google) · Kenji Hata (Google) · Enming Luo (Google) · Ranjay Krishna (University of Washington) · Ariel Fuxman (Google)
Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception
Lei Fan (Northwestern University) · Mingfu Liang (Northwestern University) · Yunxuan Li (Northwestern University) · Gang Hua (Wormpex AI Research) · Ying Wu (Northwestern University)
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal (University of Missouri) · Aditya Avinash (Google) · Neil Alldrin (Google) · Jan Dlabal (Research, Google) · Wenlei Zhou (Google) · Enming Luo (Google) · Otilia Stretcu (Google Research) · Hao Xiong (Google) · Chun-Ta Lu (Google Research) · Howard Zhou (Google Research) · Ranjay Krishna (University of Washington) · Ariel Fuxman (Google) · Tom Duerig (Google)
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Lingyi Hong (Fudan University) · Shilin Yan (Fudan University) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Wanyun Li (Fudan University) · Xinyu Zhou (None) · Pinxue Guo (Fudan University) · Kaixun Jiang (Fudan University) · Yiting Cheng (None) · Jinglun Li (None) · Zhaoyu Chen (Fudan University) · Wenqiang Zhang (None)
FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding
Jinglin Xu (University of Science and Technology Beijing) · Guohao Zhao (Peking University) · Sibo Yin (Peking University) · Wenhao Zhou (University of Science and Technology Beijing) · Yuxin Peng (Peking University)
An Aggregation-Free Federated Learning for Tackling Data Heterogeneity
Yuan Wang (Institute of High Performance Computing, Singapore, A*STAR) · Huazhu Fu (Institute of High Performance Computing, Singapore, A*STAR) · Renuga Kanagavelu (Institute of High Performance Computing, Singapore, A*STAR) · Qingsong Wei (Agency for Science, Technology and Research (A*STAR)) · Yong Liu (Institute of High Performance Computing, Singapore, A*STAR) · Rick Goh (Institute of High Performance Computing, Singapore, A*STAR)
Infrared Adversarial Car Stickers
Xiaopei Zhu (Tsinghua University) · Yuqiu Liu (Beijing Forestry University) · Zhanhao Hu (UC Berkeley) · Jianmin Li (Department of computer science and technology, Tsinghua University) · Xiaolin Hu (Tsinghua University, Tsinghua University)
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina (University of Trento) · Massimiliano Mancini (University of Trento) · Elia Cunegatti (University of Trento) · Gaowen Liu (None) · Giovanni Iacca (University of Trento) · Elisa Ricci (University of Trento)
CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification
Yiyu Chen (Beijing Institute of Technology) · Zheyi Fan (Beijing Institute of Technology) · Zhaoru Chen (Beijing Institute of Technology) · Yixuan Zhu (Beijing Institute of Technology)
Harnessing Meta-Learning for Improving Full-Frame Video Stabilization
Muhammad Kashif Ali (Hanyang University) · Eun Woo Im (Hanyang University) · Dongjin Kim (Hanyang University) · Tae Hyun Kim (Hanyang Univ.)
CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition
Qixuan Zheng (City University of Hong Kong) · Ming Zhang (Hong Kong Applied Science and Technology Research Institute (ASTRI)) · Hong Yan (City University of Hong Kong)
SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology
Saarthak Kapse (State University of New York at Stony Brook) · Pushpak Pati (International Business Machines) · Srijan Das (University of North Carolina at Charlotte) · Jingwei Zhang (None) · Chao Chen (State University of New York, Stony Brook) · Maria Vakalopoulou (CentraleSupelec) · Joel Saltz (State University of New York at Stony Brook) · Dimitris Samaras (Stony Brook University) · Rajarsi Gupta (Academic medical center at State University of New York at Stony Brook) · Prateek Prasanna (State University of New York, Stony Brook)
Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction
Cheng Sun (NVIDIA) · Wei-En Tai (National Tsinghua University) · Yu-Lin Shih (National Tsinghua University) · Kuan-Wei Chen (National Tsinghua University) · Yong-Jing Syu (National Tsinghua University) · Kent Selwyn The (National Tsinghua University) · Yu-Chiang Frank Wang (NVIDIA) · Hwann-Tzong Chen (National Tsing Hua University)
Boosting Adversarial Transferability by Block Shuffle and Rotation
Kunyu Wang (The Chinese University of Hong Kong) · he xuanran (TikTok) · Wenxuan Wang (The Chinese University of Hong Kong) · Xiaosen Wang (Huazhong University of Science and Technology)
Advancing Saliency Ranking with Human Fixations: Dataset, Models and Benchmarks
Bowen Deng (Computer Vision Laboratory University of Nottingham) · Siyang Song (University of Leicester) · Andrew French (University of Nottingham) · Denis Schluppeck (University of Nottingham) · Michael Pound (University of Nottingham)
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects
Abhinav Kumar (Michigan State University) · Yuliang Guo (Bosch US Research) · Xinyu Huang (Robert Bosch Research NA) · Liu Ren (Bosch Research) · Xiaoming Liu (None)
GALA: Generating Animatable Layered Assets from a Single Scan
Taeksoo Kim (Seoul National University) · Byungjun Kim (Seoul National University) · Shunsuke Saito (Reality Labs Research) · Hanbyul Joo (Seoul National University)
Single Mesh Diffusion Models with Field Latents for Texture Generation
Thomas W. Mitchel (PlayStation) · Carlos Esteves (Google Research) · Ameesh Makadia (Google Research)
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo (CMU, Carnegie Mellon University) · Pedro Morgado (None)
Hearing Anything Anywhere
Mason Wang (Stanford University) · Ryosuke Sawata (Sony Research) · Samuel Clarke (Stanford University) · Ruohan Gao (Stanford University) · Shangzhe Wu (Stanford University) · Jiajun Wu (Stanford University)
Move Anything with Layered Scene Diffusion
Jiawei Ren (Nanyang Technological University) · Mengmeng Xu (Meta AI) · Jui-Chieh Wu (Meta) · Ziwei Liu (Nanyang Technological University) · Tao Xiang (University of Surrey) · Antoine Toisoul (Meta)
Learning Diffusion Texture Priors for Image Restoration
Tian Ye (Hong Kong University of Science and Technology, Guangzhou Campus) · Sixiang Chen (Hong Kong University of Science and Technology (GZ)) · Wenhao Chai (University of Washington) · Zhaohu Xing (Hong Kong University of Science and Technology) · Jing Qin (Hong Kong Polytechnic University) · Ge lin (Hong Kong University of Science and Technology (Guangzhou)) · Lei Zhu (Hong Kong University of Science and Technology (Guangzhou) & HKUST)
DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos
Arjun Balasingam (Massachusetts Institute of Technology) · Joseph Chandler (Massachusetts Institute of Technology) · Chenning Li (None) · Zhoutong Zhang (Adobe Systems) · Hari Balakrishnan (Massachusetts Institute of Technology)
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng (Shenzhen MSU-BIT University) · Xiaoyong Chen (Shenzhen University) · Jiaming Liang (Shenzhen University) · Huisi Wu (Shenzhen University) · Guang-Zhong Cao (Shenzhen University) · Yong Guo (Max-Planck Institute for Informatics)
Implicit Event-RGBD Neural SLAM
Delin Qu (Fudan University) · Chi Yan (Shanghai AI Laboratory) · Dong Wang (Shanghai AI Laboratory) · Jie Yin (Shanghai Jiaotong University) · Qizhi Chen (None) · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology) · Yiting Zhang (Zhejiang University) · Bin Zhao (Northwest Polytechnical University Xi'an) · Xuelong Li (Northwestern Polytechnical University)
Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection
Xiaowei Zhao () · Xianglong Liu (BUAA) · Duorui Wang (Beijing University of Aeronautics and Astronautics) · Yajun Gao (Beihang University) · Zhide Liu (Beijing University of Aeronautics and Astronautics)
RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation
Yi Rong (Nanjing University) · Haoran Zhou (Nanjing University) · Kang Xia (nanjing university) · Cheng Mei (nanjing university) · Jiahao Wang () · Tong Lu (Nanjing University)
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
Jiun Tian Hoe (Nanyang Technological University) · Xudong Jiang (Nanyang Technological University) · Chee Seng Chan (Universiti Malaya) · Yap-peng Tan (Nanyang Technological University) · Weipeng Hu (Nanyang Technological University)
Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios
Shiyan Chen (Peking University) · Jiyuan Zhang (Peking University) · Zhaofei Yu (Peking University) · Tiejun Huang (Peking University)
Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Anthropic Prior Knowledge
Bo Zou (Computer Science, Tsinghua University, Tsinghua University) · Shaofeng Wang (Capital Medical Universty) · Hao Liu (, Tsinghua University) · Gaoyue Sun (Imperial College London) · Yajie Wang (Tsinghua University, Tsinghua University) · Zuo FeiFei (LargeV .Inc) · Chengbin Quan (Tsinghua University, Tsinghua University) · Youjian Zhao (Tsinghua University)
Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation
Wenxiao Deng (Nanjing University) · Wenbin Li (Nanjing University) · Tianyu Ding (Microsoft) · Lei Wang (University of Wollonong) · Hongguang Zhang (Systems Engineering Institute, AMS) · Kuihua Huang (National University of Defense Technology) · Jing Huo (Nanjing University) · Yang Gao (Nanjing University)
Multi-Level Neural Scene Graphs for Dynamic Urban Environments
Tobias Fischer (ETH Zurich) · Lorenzo Porzi (Facebook) · Samuel Rota Bulò (Meta) · Marc Pollefeys (ETH Zurich / Microsoft) · Peter Kontschieder (Meta)
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
Junming Chen (Hong Kong University of Science and Technology) · Yunfei Liu (International Digital Economy Academy (IDEA)) · Jianan Wang (None) · Ailing Zeng (IDEA) · Yu Li (International Digital Economy Academy) · Qifeng Chen (Hong Kong University of Science and Technology)
One-step Diffusion with Distribution Matching Distillation
Tianwei Yin (Massachusetts Institute of Technology) · Michaël Gharbi (Massachusetts Institute of Technology) · Richard Zhang (Adobe Systems) · Eli Shechtman (Adobe) · Fredo Durand (Massachusetts Institute of Technology) · William Freeman (MIT and Google) · Taesung Park (Adobe Systems)
Differentiable Display Photometric Stereo
Seokjun Choi (Pohang University of Science and Technology) · Seungwoo Yoon (POSTECH) · Giljoo Nam (Meta) · Seungyong Lee (POSTECH) · Seung-Hwan Baek (POSTECH)
On Exact Inversion of DPM-Solvers
Seongmin Hong (Seoul National University) · Kyeonghyun Lee (Seoul National University) · Suh Yoon Jeon (Seoul National University) · Hyewon Bae (Seoul National University) · Se Young Chun (Seoul National University)
Re-thinking Data Availability Attacks Against Deep Neural Networks
Bin Fang (Shanghai Jiao Tong University) · Bo Li (vivo Mobile Communication Co.,Ltd.) · Shuang Wu (Tencent YouTu Lab) · Shouhong Ding (Tencent Youtu Lab) · Ran Yi (Shanghai Jiao Tong University) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
Privacy-Preserving Face Recognition Using Trainable Feature Subtraction
Yuxi Mi (Fudan University) · Zhizhou Zhong (Fudan University) · Yuge Huang (Tencent Youtu Lab) · Jiazhen Ji (Tencent Youtu Lab) · Jianqing Xu (HIT) · Jun Wang (None) · ShaoMing Wang (WeChat Pay Lab33) · Shouhong Ding (Tencent Youtu Lab) · Shuigeng Zhou (Fudan University)
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang (University of Michigan) · Ziqiao Ma (University of Michigan) · Xiaofeng Gao (Amazon AGI) · Suhaila Shakiah (Amazon) · Qiaozi Gao (Amazon) · Joyce Chai (University of Michigan)
DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting
Jer Pelhan (Universtiy of Ljubljana) · Alan Lukezic (University of Ljubljana) · Vitjan Zavrtanik (University of Ljubljana) · Matej Kristan (University of Ljubljana)
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Dinh Phat Do (Ajou University) · Taehoon Kim (Ajou University) · JAEMIN NA (Tech. Innovation Group, KT) · Jiwon Kim (Robotics Lab, Hyundai Motor Company) · Keonho LEE (Hyundai Motor Company) · Kyunghwan Cho (Hyundai Motor Company) · Wonjun Hwang (Ajou University)
MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying
Ryan Burgert (Stony Brook University) · Brian Price (Adobe Research) · Jason Kuen (Adobe Research) · Yijun Li (Adobe Research) · Michael Ryoo (Stony Brook University)
CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral (Virginia Tech) · Enis Simsar (ETH Zurich) · Federico Tombari (Google, TUM) · Pinar Yanardag (Virginia Polytechnic Institute and State University)
Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization
Insoo Kim (Korea Advanced Institute of Science and Technology) · Jae Seok Choi (Samsung Advanced Institute of Technology (SAIT)) · Geonseok Seo (Samsung) · Kinam Kwon (Samsung) · Jinwoo Shin (Korea Advanced Institute of Science and Technology) · Hyong-Euk Lee (Samsung Advanced Institute of Technology)
Self-Supervised Multi-Object Tracking with Path Consistency
Zijia Lu (Northeastern University) · Bing Shuai (Amazon Web Service) · Yanbei Chen (Amazon) · Zhenlin Xu (Amazon) · Davide Modolo (Amazon)
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
Lin Li (King's College London) · Haoyan Guan (King's College London, University of London) · Jianing Qiu (Imperial College London) · Michael Spratling (King's College London and University of Luxembourg)
Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model
Wenfeng Song (Beijing Information Science and Technology University) · Xingliang Jin (Beijing information science and information university ) · Shuai Li (Beijing University of Aeronautics and Astronautics) · Chenglizhao Chen (China University of Petroleum) · Aimin Hao (None) · Xia HOU (Beijing Information Science & Technology University) · Ning Li (Beijing Information Science and Technology University) · Hong Qin (Stony Brook University (State University of New York at Stony Brook))
SPECAT: SPatial-spEctral Cumulative-Attention Transformer for High-Resolution Hyperspectral Image Reconstruction
Zhiyang Yao (Department of Electronic Engineering, Tsinghua University) · Shuyang Liu (Tsinghua university) · Xiaoyun Yuan (Tsinghua University) · Lu Fang (Tsinghua University, Tsinghua University)
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
Jijie He (Zhejiang Gongshang University) · Wenwu Yang (Zhejiang Gongshang University)
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Argaw Argaw (None) · Seunghyun Yoon (Adobe Research) · Fabian Caba Heilbron (Adobe Research) · Hanieh Deilamsalehy (None) · Trung Bui (Adobe Research) · Zhaowen Wang (Adobe Research) · Franck Dernoncourt (Adobe Systems) · Joon Chung (KAIST)
Neural Refinement for Absolute Pose Regression with Feature Synthesis
Shuai Chen (University of Oxford) · Yash Bhalgat (Visual Geometry Group, University of Oxford) · Xinghui Li (University of Oxford) · Jia-Wang Bian (University of Oxford) · Kejie Li (University of Oxford) · Zirui Wang (University of Oxford) · Victor Adrian Prisacariu (None)
Single-View Scene Point Cloud Human Grasp Generation
Yan-Kang Wang (SUN YAT-SEN UNIVERSITY)) · Chengyi Xing (Stanford University) · Yi-Lin Wei (SUN YAT-SEN UNIVERSITY) · Xiao-Ming Wu (SUN YAT-SEN UNIVERSITY) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY)
UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement
yaofeng xie (Ocean University of China) · Lingwei Kong (Sanya Oceanographic Institution, Ocean University of China) · Kai Chen (Sanya Oceanographic Institution, Ocean University of China) · Zheng Ziqiang (Hong Kong University of Science and Technology) · Xiao Yu (Sanya Oceanographic Institution, Ocean University of China) · Zhibin Yu (Sanya Oceanographic Institution, Ocean university of China) · Bing Zheng (Sanya Oceanographic Institution, Ocean University of China)
GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects
Sungphill Moon (Naver Labs) · Hyeontae Son (Naver Labs) · Dongcheol Hur (NAVER LABS) · Sangwook Kim (Naver Labs)
DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
Xiaoyun Zheng (Peking University Shenzhen Graduate School) · Liwei Liao (Peking University) · Xufeng Li (Cityu) · Jianbo Jiao (University of Birmingham) · Rongjie Wang (PengCheng Laboratory) · Feng Gao (Peking University) · Shiqi Wang (City University of Hong Kong) · Ronggang Wang (Peking University Shenzhen Graduate School)
Improving Depth Completion via Depth Feature Upsampling
Yufei Wang (Northwest Polytechnical University Xi'an) · Ge Zhang (Northwest Polytechnical University Xi'an) · Shaoqian Wang (Northwest Polytechnical University Xi'an) · Bo Li (None) · Qi Liu (Northwest Polytechnical University Xi'an) · Le Hui (Nanjing University Of Science And Technology) · Yuchao Dai (Northwestern Polytechnical University)
Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
Nobuhiko Wakai (Panasonic Holdings Corporation) · Satoshi Sato (Panasonic Holdings Corporation) · Yasunori Ishii (Panasonic Holdings Corporation) · Takayoshi Yamashita (Chubu University)
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang (Zhejiang University) · Sha Zhang (None) · Di Huang (University of Sydney) · Xiaoyang Wu (The University of Hong Kong) · Haoyi Zhu (University of Science and Technology of China) · Tong He (Shanghai AI Lab) · SHIXIANG TANG (The Chinese University of Hong Kong) · Hengshuang Zhao (The University of Hong Kong) · Qibo Qiu (Zhejiang Lab) · Binbin Lin (Zhejiang University) · Xiaofei He (Zhejiang University) · Wanli Ouyang (University of Sydney)
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Maitreya Patel (Arizona State University) · Changhoon Kim (Arizona State University) · Sheng Cheng (Arizona State University) · Chitta Baral (Arizona State University) · 'YZ' Yezhou Yang (Arizona State University)
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
Ziyang Chen (University of Michigan) · Israel D. Gebru (Facebook) · Christian Richardt (Meta Reality Labs) · Anurag Kumar (Facebook) · William Laney (Meta) · Andrew Owens (University of Michigan) · Alexander Richard (Reality Labs Research, Meta)
Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection
Chuangchuang Tan (Beijing Jiaotong University) · Huan Liu (Beijing Jiaotong University) · Yao Zhao (Beijing Jiaotong University) · Shikui Wei (Beijing jiaotong university) · Guanghua Gu (Yan Shan University) · Ping Liu (Institute of High Performance Computing, Singapore, A*STAR) · Yunchao Wei (Beijing Jiaotong University)
Nearest Is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li (Wuhan University) · Yishuo Cai (Central South University) · Haowei Li (Wuhan University) · Feng Xue (ZJU-Hangzhou Global Scientific and Technological Innovation Center) · Zhifeng Li (Tencent) · Yiming Li (Zhejiang University)
Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception
Haoming Chen (East China Normal Univeristy) · Zhizhong Zhang (East China Normal University) · Yanyun Qu (Xiamen University) · Ruixin Zhang (Tencent Youtu Lab) · Xin Tan (East China Normal University) · Yuan Xie (East China Normal University)
Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket
Chengxu Zuo (Xiamen University) · Yiming Wang (Xiamen University) · Lishuang Zhan (Xiamen University) · Shihui Guo (Xiamen University) · Xinyu Yi (Tsinghua University) · Feng Xu (Tsinghua University, Tsinghua University) · Yipeng Qin (Cardiff University)
Neural Exposure Fusion for High-Dynamic Range Object Detection
Emmanuel Onzon (Torc Robotics) · Maximilian Bömer (Torc Robotics) · Fahim Mannan () · Felix Heide (Department of Computer Science, Princeton University)
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu (National University of Singapore) · Wenjie Wang (National University of Singapore) · Yongqi Li (Hong Kong Polytechnic University) · Hanwang Zhang (Nanyang Technological University) · Liqiang Nie (Harbin Institute of Technology (Shenzhen)) · Tat-seng Chua (National University of Singapore)
TEA: Test-time Energy Adaptation
Yige Yuan (None) · Bingbing Xu (Institute of Computing Technology, Chinese Academy of Sciences) · Liang Hou (Kuaishou Technology) · Fei Sun (Institute of Computing Technology, Chinese Academy of Sciences) · Huawei Shen (Institute of Computing Technology, Chinese Academy of Sciences) · Xueqi Cheng (, Chinese Academy of Sciences)
Model Adaptation for Time Constrained Embodied Control
Jaehyun Song (Sungkyunkwan University) · Minjong Yoo (Sungkyunkwan University) · Honguk Woo (Sungkyunkwan University)
GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes
Haozhe Lin (None) · Chunyu Wei (Tsinghua University, Tsinghua University) · Li He (None) · Yuchen Guo (Tsinghua University, Tsinghua University) · Yuchy Zhao (Tsinghua University, Tsinghua University) · Shanglong Li (Tsinghua University) · Lu Fang (Tsinghua University, Tsinghua University)
Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
Dongsu Zhang (Seoul National University) · Francis Williams (NVIDIA) · Žan Gojčič (NVIDIA) · Karsten Kreis (NVIDIA) · Sanja Fidler (Department of Computer Science, University of Toronto) · Young Min Kim (Seoul National University) · Amlan Kar (NVIDIA)
Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
Mingqi Jiang (Oregon State University) · Saeed Khorram (Apple) · Li Fuxin (Oregon State University)
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed (Khalifa University of Science and Technology) · Arif Mahmood (Information Technology University, Lahore) · IYYAKUTTI IYAPPAN GANAPATHI (Khalifa University of Science, Technology and Research) · Fayaz Ali (Khalifa University of Science, Technology and Research) · Naoufel Werghi (Khalifa University) · Mohammed Bennamoun (University of Western Australia)
TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation
Hoonhee Cho (KAIST) · Taewoo Kim (KAIST) · Yuhwan Jeong (KAIST) · Kuk-Jin Yoon (KAIST)
One-Prompt to Segment All Medical Images
Wu (None) · Min Xu (Carnegie Mellon University)
Quantifying Task Priority for Multi-Task Optimization
Wooseong Jeong (KAIST) · Kuk-Jin Yoon (KAIST)
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
Yuheng Jiang (ShanghaiTech University) · Zhehao Shen (ShanghaiTech University) · Penghao Wang (None) · Zhuo Su (ByteDance) · Yu Hong (ShanghaiTech University) · Yingliang Zhang (DGene Inc.) · Jingyi Yu (ShanghaiTech University) · Lan Xu (ShanghaiTech University)
Image Sculpting: Precise Object Editing with 3D Geometry Control
Jiraphon Yenphraphai (New York University) · Xichen Pan (New York University) · Sainan Liu (Intel) · Daniele Panozzo (New York University) · Saining Xie (Facebook)
UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence
Ruihai Wu (Peking University) · Haoran Lu (Peking University) · Yiyan Wang (Beijing Institute of Technology) · Yubo Wang (Peking University) · Hao Dong (None)
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
Yushuang Wu (The Chinese University of Hong Kong (Shenzhen)) · Luyue Shi (The Chinese University of Hong Kong, Shenzhen) · Junhao Cai (Hong Kong University of Science and Technology) · Weihao Yuan (Alibaba Group) · Lingteng Qiu (None) · Zilong Dong (Alibaba Group) · Liefeng Bo (None) · Shuguang Cui (The Chinese University of Hong Kong, Shenzhen) · Xiaoguang Han (The Chinese University of Hong Kong, Shenzhen)
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu (Shanghai Jiao Tong University) · Yikun Liu (Shanghai Jiaotong University) · Ferenas (None) · Chen Ju () · Ya Zhang (Shanghai Jiao Tong University) · Yanfeng Wang (Shanghai Jiao Tong University)
Doubly Abductive Counterfactual Inference for Text-based Image Editing
Xue Song (Fudan University) · Jiequan Cui (Nanyang Technological University) · Hanwang Zhang (Nanyang Technological University) · Jingjing Chen (Fudan University) · Richang Hong (Hefei University of Technology) · Yu-Gang Jiang (Fudan University)
Distilling Semantic Priors from SAM to Efficient Image Restoration Models
Quan Zhang (Tsinghua University, Tsinghua University) · Xiaoyu Liu (University of Science and Technology of China) · Wei Li (Huawei Noah's Ark Lab) · Hanting Chen (Huawei Technologies Ltd.) · Junchao Liu (Huawei Noah's Ark Lab) · Jie Hu (Huawei Technologies Ltd.) · Zhiwei Xiong (None) · Chun Yuan (Tsinghua University, Tsinghua University) · Yunhe Wang (Huawei Noah's Ark Lab)
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Haiwen Diao (Dalian University of Technology) · Bo Wan (KU Leuven) · Ying Zhang (Tencent) · Xu Jia (Dalian University of Technology) · Huchuan Lu (Dalian University of Technology) · Long Chen (HKUST)
Multimodal autoregressive learning for time-aligned and contextual modalities
AJ Piergiovanni (Google) · Isaac Noble (Google) · Dahun Kim (Google) · Michael Ryoo (Stony Brook University) · Victor Gomes (Google) · Anelia Angelova (Google)
SD2Event: Self-supervised Learning of Dynamic Detectors and Contextual Descriptors for Event Cameras
Yuan Gao (University of Science and Technology of China) · Yuqing Zhu (University of Science and Technology of China) · Xinjun Li (University of Science and Technology of China) · Yimin Du (University of Science and Technology of China) · Tianzhu Zhang (University of Science and Technology of China)
Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation
Xu Zheng (HKUST) · Pengyuan Zhou (Aarhus University) · ATHANASIOS (ICT) · Lin Wang (Hong Kong University of Science and Technology)
Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods
Mengyu Dai (Florida State University) · Amir Hossein Raffiee (SalesForce.com) · Aashish Jain (Salesforce) · Joshua Correa (SalesForce.com)
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah (University of Maryland, College Park) · Sreyan Ghosh (University of Maryland, College Park) · Sonal Kumar (University of Maryland, College Park) · Purva Chiniya (University of Maryland, College Park) · Dinesh Manocha (University of Maryland, College Park)
CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention
Mohammad Sadil Khan (University of Luxembourg) · Elona Dupont (SnT, University of Luxemburg) · Sk Aziz Ali (DFKI GmbH) · Kseniya Cherenkova (University of Luxemburg) · Anis Kacem (University of Luxemburg) · Djamila Aouada (SnT, University of Luxembourg)
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng (ByteDance) · Chao Wen (ByteDance) · Zhuo Su (ByteDance) · Zeran Xu (Bytedance) · Zhaohu Li (ByteDance) · Yang Zhao (ByteDance) · Zhou Xue (Li Auto)
E-GPS: Explainable Geometry Problem Solving via Top-Down Solver and Bottom-Up Generator
Wenjun Wu (None) · Lingling Zhang (Xi'an Jiaotong University) · Jun Liu (Xi'an Jiaotong University) · Xi Tang (Xi'an Jiaotong University) · Yaxian Wang (Xi'an Jiaotong University) · Shaowei Wang (Xi'an Jiaotong University) · QianYing Wang (lenovo group)
Instance Tracking in 3D Scenes from Egocentric Videos
Yunhan Zhao (University of California, Irvine) · Haoyu Ma (University of California, Irvine) · Shu Kong (University of Macau, Texas A&M University) · Charless Fowlkes (University of California, Irvine)
CAGE: Controllable Articulation GEneration
Jiayi Liu (None) · Hou In Ivan Tam (Simon Fraser University) · Ali Mahdavi Amiri (Simon Fraser University) · Manolis Savva (Simon Fraser University)
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Shoukang Hu (Nanyang Technological University) · Tao Hu (Nanyang Technological University) · Ziwei Liu (Nanyang Technological University)
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui (None) · Tengyu Liu (None) · Nian Liu (Beijing University of Posts and Telecommunications) · Yaodong Yang (Peking University) · Yixin Zhu (Peking University) · Siyuan Huang (Beijing Institute of General Artificial Intelligence)
$M^3$-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection
Bin Pu (Hong Kong University of Science and Technology) · Liwen Wang (Anhui University) · Jiewen Yang (Hong Kong University of Science and Technology) · He Guannan (Sichuan University) · Xingbo Dong (Anhui University) · Shengli Li (Shenzhen Maternity and Child Healthcare Hospital) · Ying Tan (Shenzhen Maternity and Child Healthcare Hospital) · Ming Chen (Harbin Red Cross Central Hospital ) · Zhe Jin (Anhui University) · Kenli Li (Hunan University) · Xiaomeng Li (The Hong Kong University of Science and Technology)
Cyclic Learning for Binaural Audio Generation and Localization
Zhaojian Li (Northwest Polytechnical University) · Bin Zhao (Northwest Polytechnical University Xi'an) · Yuan Yuan (Northwest Polytechnical University Xi'an)
3D Feature Tracking via Event Camera
Siqi Li (Tsinghua University) · Zhou Zhikuan (None) · Zhou Xue (Li Auto) · Yipeng Li (Tsinghua University, Tsinghua University) · Shaoyi Du (Xi'an Jiaotong University) · Yue Gao (Tsinghua University, Tsinghua University)
Frequency-aware Event-based Video Deblurring for Real-World Motion Blur
Taewoo Kim (KAIST) · Hoonhee Cho (KAIST) · Kuk-Jin Yoon (KAIST)
Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)
Tsu-Ching Hsiao (Woven by Toyota) · Hao-Wei Chen (National Tsing Hua University) · Hsuan-Kung Yang (National Tsinghua University) · Chun-Yi Lee (National Tsing Hua University)
QUADify: Extracting Meshes with Pixel-level Details and Materials from Images
Maximilian Frühauf (ETH Zurich & Disney Research | Studios) · Hayko Riemenschneider (Disney Research|Studios) · Markus Gross (Disney Research, Disney) · Christopher Schroers (Disney Research|Studios, Disney)
FedHCA$^2$: Towards Hetero-Client Federated Multi-Task Learning
Yuxiang Lu (Shanghai Jiao Tong University) · Suizhi Huang (Shanghai Jiao Tong University) · Yuwen Yang (Shanghai Jiao Tong University) · Shalayiding Sirejiding () · Yue Ding (Shanghai Jiao Tong University) · Hongtao Lu (Shanghai Jiao Tong University)
Improving Unsupervised Hierarchical Representation with Reinforcement Learning
Ruyi An (Nanyang Technological University) · Yewen Li (Nanyang Technological University) · Xu He (Huawei Technologies Ltd.) · Pengjie Gu (Nanyang Technological University) · Mengchen Zhao (South China University of Technology) · Dong Li (Huawei Technologies Ltd.) · Jianye Hao (Tianjin University) · Bo An (Nanyang Technological University) · Chaojie Wang (Skywork AI) · Mingyuan Zhou (The University of Texas at Austin)
Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
Yiming Li (New York University) · Zhiheng Li (New York University) · Nuo Chen (New York University) · Moonjun Gong (New York University) · Zonglin Lyu (New York University) · Zehong Wang (New York University) · Peili Jiang (New York University) · Chen Feng (New York University)
All Rivers Run to the Sea: Private Learning with Asymmetric Flows
Yue Niu (USC) · Ramy E. Ali (Samsung) · Saurav Prakash (University of Illinois at Urbana-Champaign) · Salman Avestimehr (University of Southern California)
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
Chengxiang Fan (Zhejiang University) · Muzhi Zhu (Zhejiang University) · Hao Chen (Zhejiang University) · Yang Liu (Zhejiang University) · Weijia Wu (None) · Huaqi Zhang (Hangzhou VIVO Information Technology Co., Ltd) · Chunhua Shen (Zhejiang University)
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
Tao Hu (Nanyang Technological University) · Fangzhou Hong (Nanyang Technological University) · Ziwei Liu (Nanyang Technological University)
Data Poisoning based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang (University of California, Los Angeles (UCLA)) · Hongbin Liu (Duke University) · Jinyuan Jia (Pennsylvania State University) · Neil Zhenqiang Gong (Duke University)
DART: Implicit Doppler Tomography for Radar Novel View Synthesis
Tianshu Huang (Carnegie Mellon University) · John Miller (Carnegie Mellon University) · Akarsh Prabhakara (Carnegie Mellon University) · Tao Jin (CMU, Carnegie Mellon University) · Tarana Laroia (CMU, Carnegie Mellon University) · Zico Kolter (Carnegie Mellon University) · Anthony Rowe (Carnegie Mellon University)
Video Interpolation with Diffusion Models
Siddhant Jain (Google Research) · Daniel Watson (Google DeepMind) · Aleksander Holynski (UC Berkeley & Google Research) · Eric Tabellion (Google) · Ben Poole (Google) · Janne Kontkanen (Research, Google)
Dispersed Structured Light for Hyperspectral 3D Imaging
Suhyun Shin (Pohang University of Science and Technology) · Seokjun Choi (Pohang University of Science and Technology) · Felix Heide (Department of Computer Science, Princeton University) · Seung-Hwan Baek (POSTECH)
DualAD: Disentangling the Dynamic and Static World for End-to-End Driving
Simon Doll (Eberhard-Karls-Universität Tübingen) · Niklas Hanselmann (Mercedes Benz Research & Development) · Lukas Schneider (Mercedes Benz Research & Development) · Richard Schulz (Mercedes Benz AG) · Marius Cordts (Mercedes-Benz) · Markus Enzweiler (Esslingen University of Applied Sciences) · Hendrik Lensch (University of Tübingen)
DeMatch: Deep Decomposition of Motion Field for Two-View Correspondence Learning
Shihua Zhang (Wuhan University) · Zizhuo Li (Wuhan University) · Yuan Gao (Wuhan University) · Jiayi Ma (Wuhan University)
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu (None) · Runyu He (Nanjing University) · Gangshan Wu (Nanjing University) · Limin Wang (Nanjing University)
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo (None) · Nian Liu (Mohamed bin Zayed University of Artificial Intelligence) · Wangbo Zhao (National University of Singapore) · Xuguang Yang (Northwestern Polytechnical University Xi'an) · Dingwen Zhang (Northwestern Polytechnical University) · Deng-Ping Fan (ETH Zurich) · Fahad Shahbaz Khan (MBZUAI; Linköping University) · Junwei Han (Northwestern Polytechnical University, Tsinghua University)
FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation
Chris Rockwell (University of Michigan) · Nilesh Kulkarni (None) · Linyi Jin (None) · Jeong Joon Park (Stanford University) · Justin Johnson (University of Michigan) · David Fouhey (New York University)
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Zhiwu Qing (Huazhong University of Science and Technology, Tsinghua University) · Shiwei Zhang (Alibaba Group) · Jiayu Wang (None) · Xiang Wang (Huazhong University of Science and Technology) · Yujie Wei (Fudan University) · Yingya Zhang (Alibaba Group) · Changxin Gao (Huazhong University of Science and Technology) · Nong Sang (Huazhong University of Science and Technology)
Self-supervised debiasing using low rank regularization
Geon Yeong Park (Korea Advanced Institute of Science and Technology) · Chanyong Jung (Korea Advanced Institute of Science and Technology) · Sangmin Lee (Korea Advanced Institute of Science & Technology) · Jong Chul Ye (Korea Advanced Institute of Science and Technology) · Sang Wan Lee (Korea Advanced Institute of Science & Technology)
Neural Markov Random Field for Stereo Matching
Tongfan Guan (The Chinese University of Hong Kong) · Chen Wang (University at Buffalo) · Yun-Hui Liu (The Chinese University of Hong Kong)
Ungeneralizable Examples
Jingwen Ye (National University of Singapore) · Xinchao Wang (National University of Singapore)
Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency
Xu Yingjie (None) · Bangzhen Liu (South China University of Technology) · Hao Tang (School of Computer Science and Engineering, Nanjing University of Science and Technology) · Bailin Deng (Cardiff University) · Shengfeng He (Singapore Management University)
Language-only Training of Zero-shot Composed Image Retrieval
Geonmo Gu (NAVER) · Sanghyuk Chun (NAVER AI Lab) · Wonjae Kim (NAVER) · Yoohoon Kang (NAVER) · Sangdoo Yun (NAVER)
Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs
Sunghwan Hong (Korea University) · Jaewoo Jung (Korea University) · Heeseong Shin (Korea University) · Jiaolong Yang (Microsoft Research) · Chong Luo (Microsoft Research Asia) · Seungryong Kim (Korea University)
ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments
Jingyu Zhang (Fudan University) · Kun Yang (Fudan University) · Yilei Wang (Fudan University) · Hanqi Wang (Fudan University) · Peng Sun (Duke Kunshan University) · Liang Song (Fudan University)
SPAD: Spatially Aware Multiview Diffusers
Yash Kant (University of Toronto / Snap Research) · Aliaksandr Siarohin (Snap Inc.) · Ziyi Wu (University of Toronto) · Michael Vasilkovsky (Snap Inc.) · Guocheng Qian (KAUST) · Jian Ren (Snap Inc.) · Riza Alp Guler (Snap Inc.) · Bernard Ghanem (KAUST) · Sergey Tulyakov (Snap Inc.) · Igor Gilitschenski (University of Toronto)
Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan (Nanjing University of Science and Technology) · Yuankai Lin (Huazhong University of Science and Technology) · Kun Wang (Nanjing University of Science and Technology) · Yupeng Zheng (Institute of Automation,Chinese Academy of Sciences) · Yufei Wang (Northwest Polytechnical University Xi'an) · Zhenyu Zhang (None) · Jun Li (Nanjing University of Science and Technology) · Jian Yang (Nanjing University of Science and Technology)
Text-to-3D Generation with Bidirectional Diffusion using both 3D and 2D priors
Lihe Ding (The Chinese University of Hong Kong) · Shaocong Dong (Hong Kong University of Science and Technology) · Zhanpeng Huang (SenseTime Research) · Zibin Wang (Sensetime Group Limited) · Yiyuan Zhang (The Chinese University of Hong Kong) · Kaixiong Gong (None) · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology) · Tianfan Xue (The Chinese University of Hong Kong)
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu (Google Deepmind) · Kelvin C.K. Chan (Google DeepMind) · Yu-Chuan Su (Google) · Wenhu Chen (University of Waterloo) · Yandong Li (Google Research) · Kihyuk Sohn (Google) · Yang Zhao (Google) · Xue Ben (Google) · William Cohen (Google DeepMind) · Ming-Wei Chang (Google) · Xuhui Jia (Google)
Beyond Average: Individualized Visual Scanpath Prediction
Xianyu Chen (University of Minnesota) · Ming Jiang (University of Minnesota, Minneapolis) · Qi Zhao (University of Minnesota, Minneapolis)
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
Woo-Jin Ahn (Korea University) · Geun-Yeong Yang (Korea University) · Hyunduck Choi (Chonnam National University) · Myo-Taeg Lim (Korea University)
DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach
Dayi Tan (Tongji university) · Hansheng Chen (Stanford University) · Wei Tian (Tongji University) · Lu Xiong (Tongji University)
Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
Takashi Otonari (The University of Tokyo) · Satoshi Ikehata (NII, Tokyo Institute of Technology) · Kiyoharu Aizawa (The University of Tokyo)
Test-Time Domain Generalization for Face Anti-Spoofing
Qianyu Zhou (Shanghai Jiao Tong University) · Ke-Yue Zhang (Tencent) · Taiping Yao (Tencent Youtu Lab) · Xuequan Lu (La Trobe University) · Shouhong Ding (Tencent Youtu Lab) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation
Jin Wang (China University of Petroleum) · Bingfeng Zhang (China University of Petroleum (East China)) · Jian Pang (China University of Petroleum (East China)) · Honglong Chen (China University of Petroleum) · Weifeng Liu (China University of Petroleum (East China))
Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Ke Fan (Fudan University) · Zechen Bai (Show Lab, National University of Singapore) · Tianjun Xiao (Amazon) · Tong He (Amazon Web Services) · Max Horn (GSK plc) · Yanwei Fu (Fudan University) · Francesco Locatello (ISTA) · Zheng Zhang (New York University)
Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos
Leonhard Sommer (University of Freiburg, Albert-Ludwigs-Universität Freiburg) · Artur Jesslen (University of Freiburg) · Eddy Ilg (None) · Adam Kortylewski (University of Freiburg & MPI-INF)
Fooling Polarization-based Vision using Locally Controllable Polarizing Projection
Zhuoxiao Li (The Univerisity of Tokyo) · Zhihang Zhong (Shanghai AI Lab) · Shohei Nobuhara (Kyoto Institute of Technology) · Ko Nishino (Kyoto University) · Yinqiang Zheng (None)
Affine Equivariant Networks Based on Differential Invariants
Yikang Li (Peking University) · Yeqing Qiu (The Chinese Univeristy of Hong Kong, Shenzhen) · Yuxuan Chen (Peking University) · Lingshen He (Peking University) · Zhouchen Lin (Peking University)
C3: High-performance and low-complexity neural compression from a single image or video
Hyunjik Kim (DeepMind) · Matthias Bauer (Google DeepMind) · Lucas Theis (Google) · Jonathan Richard Schwarz (Harvard University) · Emilien Dupont (Google DeepMind)
AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing
Fan Yang (None) · Tianyi Chen (Nanyang Technological University) · XIAOSHENG HE (Nanyang Technological University) · Zhongang Cai (Nanyang Technological University) · Lei Yang (The Chinese University of Hong Kong) · Si Wu (South China University of Technology) · Guosheng Lin (Nanyang Technological University)
ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining
Ruoxi Shi (University of California, San Diego) · Xinyue Wei (University of California, San Diego) · Cheng Wang (University of California, San Diego) · Hao Su (UCSD)
Monocular Identity-Conditioned Facial Reflectance Reconstruction
Xingyu Ren (Shanghai Jiao Tong University) · Jiankang Deng (Imperial College London & Huawei UKRD) · Yuhao Cheng (Shanghai Jiaotong University) · Jia Guo (InsightFace.AI) · Chao Ma (Shanghai Jiao Tong University) · Yichao Yan (Shanghai Jiao Tong University) · Wenhan Zhu (None) · Xiaokang Yang (Shanghai Jiao Tong University, China)
3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Zhiyin Qian (Department of Computer Science, ETHZ - ETH Zurich) · Shaofei Wang (None) · Marko Mihajlovic (Swiss Federal Institute of Technology) · Andreas Geiger (University of Tübingen) · Siyu Tang (ETH Zurich)
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
Haofeng Liu (South China Normal University) · Chenshu Xu (Singapore Management University) · Yifei Yang (Singapore Management University) · Lihua Zeng (South China Normal University) · Shengfeng He (Singapore Management University)
DiVAS: Video and Audio Synchronization with Dynamic Frame Rates
Clara Maria Fernandez Labrador (Disney Research) · Mertcan Akcay (Disney Research) · Eitan Abecassis (Walt Disney Company) · Joan Massich (Disney Research) · Christopher Schroers (Disney Research|Studios, Disney)
SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
Tao Wang (Beijing University of Posts and Telecommunications) · Lei Jin (Beijing University of Posts and Telecommunications) · Zheng Wang (Wuhan University) · Jianshu Li (Ant Group) · Liang Li (None) · Fang Zhao (Tencent AI Lab) · Yu Cheng (National University of Singapore) · Li Yuan (Peking University) · Li ZHOU (Wuhan University) · Junliang Xing (Tsinghua University) · Jian Zhao ()
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
Gianluca Scarpellini (Università degli Studi di Genova, Istituto Italiano di Tecnologia) · Stefano Fiorini (Istituto Italiano di Tecnologia) · Francesco Giuliari (Istituto Italiano di Tecnologia) · Pietro Morerio (Istituto Italiano di Tecnologia) · Alessio Del Bue (Istituto Italiano di Tecnologia (IIT))
MS-DETR: Efficient DETR Training with Mixed Supervision
Chuyang Zhao (Baidu) · Yifan Sun (Baidu Research) · Wenhao Wang (University of Technology Sydney) · Qiang Chen (Baidu) · Errui Ding (Baidu Inc.) · Yi Yang (Zhejiang University) · Jingdong Wang (Baidu)
Material Palette: Extraction of Materials from a Single Image
Ivan Lopes (INRIA) · Fabio Pizzati (University of Oxford) · Raoul de Charette (Inria)
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization
Xu Peng (Xiamen University) · Junwei Zhu (Tencent Youtu Lab) · Boyuan Jiang (Tencent Youtu Lab) · Ying Tai (Nanjing University) · Donghao Luo (Tencent YouTu Lab) · Jiangning Zhang (Tencent Youtu Lab) · Wei Lin (Xiamen University) · Taisong Jin (Xiamen University) · Chengjie Wang (Tencent Youtu Lab; Shanghai Jiao Tong University) · Rongrong Ji (Xiamen University)
TUMTraf V2X Cooperative Perception Dataset
Walter Zimmer (Technical University of Munich (TUM)) · Gerhard Arya Wardana (Department of Informatics, Technische Universität München) · Suren Sritharan (Technische Universität München) · Xingcheng Zhou (Technical University of Munich) · Rui Song (Technical University of Munich) · Alois Knoll (Technical University Munich)
HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation
Linglin Jing (Loughborough University) · Yiming Ding (Fudan University) · Yunpeng Gao (Northwest Polytechnical University Xi'an) · Zhigang Wang (Shanghai AI Lab) · Xu Yan (None) · Dong Wang (Shanghai AI Laboratory) · Gerald Schaefer (Loughborough University) · Hui Fang (Loughborough University) · Bin Zhao (Northwest Polytechnical University Xi'an) · Xuelong Li (Northwestern Polytechnical University)
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
Taeheon Kim (Korea Advanced Institute of Science & Technology) · Sebin Shin (KAIST) · Youngjoon Yu (Korea Advanced Institute of Science and Technology (KAIST)) · Hak Gu Kim (Chung-Ang University) · Yong Man Ro (Korea Advanced Institute of Science and Technology)
Human Gaussian Splatting : Real-time Rendering of Animatable Avatars
Arthur Moreau (Huawei Noah's Ark Lab) · Jifei Song (Huawei Technologies Ltd.) · Helisa Dhamo (None) · Richard Shaw (Huawei Technologies Ltd.) · Yiren Zhou (Huawei Technologies Ltd.) · Eduardo Pérez-Pellitero (Huawei Noah's Ark Lab (UK))
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna (Georgia Institute of Technology) · Yongsen Mao (Simon Fraser University) · Hanxiao Jiang (University of Illinois Urbana-Champaign) · Sanjay Haresh (Qualcomm Inc, QualComm) · Brennan Shacklett (Stanford University) · Dhruv Batra (FAIR (Meta) and Georgia Tech) · Alexander William Clegg (Meta AI) · Eric Undersander (Meta) · Angel Xuan Chang (Simon Fraser University) · Manolis Savva (Simon Fraser University)
Learning to Remove Wrinkled Transparent Film with Polarized Prior
Jiaqi Tang (Hong Kong University of Science and Technology (Guangzhou)) · RUIZHENG WU (Smartmore Technology) · Xiaogang Xu (Zhejiang Lab) · Sixing Hu (Smartmore Corporation) · Ying-Cong Chen (The Hong Kong University of Science and Technology)
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
Yipeng Gao (SUN YAT-SEN UNIVERSITY) · Zeyu Wang (University of California, Santa Cruz) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY) · Cihang Xie (University of California, Santa Cruz) · Yuyin Zhou (UC Santa Cruz)
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim (Michigan State University) · Feng Liu (Michigan State University) · Yiyang Su (None) · Anil Jain (, Michigan State University) · Xiaoming Liu (None)
Training Vision Transformers for Semi-Supervised Semantic Segmentation
Xinting Hu (Nanyang Technological University) · Li Jiang (Max Planck Institute for Informatics) · Bernt Schiele (Max Planck Institute for Informatics)
Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li (Carnegie Mellon University) · Jinglu Wang (Microsoft Research Asia) · Xiaohao Xu (University of Michigan - Ann Arbor) · Xiulian Peng (Microsoft Research Asia) · Rita Singh (School of Computer Science, Carnegie Mellon University) · Yan Lu (Microsoft Research Asia) · Bhiksha Raj (Carnegie Mellon University)
NECA: Neural Customizable Human Avatar
Junjin Xiao (School of Computer Science and Engineering, Sun Yat-sen University) · Qing Zhang (SUN YAT-SEN UNIVERSITY) · Zhan Xu (None) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY)
Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM
Linyu Tang (Chongqing University) · Lei Zhang (Chongqing University)
Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning
Menghao Zhang (Beijing University of Posts and Telecommunications) · Jingyu Wang (Beijing University of Post and Telecommunication, Tsinghua University) · Qi Qi (Beijing University of Posts and Telecommunications) · Haifeng Sun (Beijing University of Posts and Telecommunications) · Zirui Zhuang (Beijing University of Posts and Telecommunications) · Pengfei Ren (Beijing University of Posts and Telecommunications) · Ruilong Ma (Beijing University of Posts and Telecommunications) · Jianxin Liao (Beijing University of Posts and Telecommunications)
Uncertainty-aware Action Decoupling Transformer for Action Anticipation
Hongji Guo (Rensselaer Polytechnic Institute) · Nakul Agarwal (Honda Research Institute USA) · Shao-Yuan Lo (Johns Hopkins University) · Kwonjoon Lee (Honda Research Institute USA) · Qiang Ji (Rensselaer Polytechnic Institute)
Boosting Adversarial Training via Fisher-Rao Norm-based Regularization
Xiangyu Yin (University of Liverpool) · Wenjie Ruan (University of Exeter)
InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
Xiefan Guo (Beihang University) · Jinlin Liu (Alibaba Group) · Miaomiao Cui (Alibaba Group) · Jiankai Li (Beihang University) · Hongyu Yang (Beihang University) · Di Huang (Beihang University)
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Chen Sichen (Shanghai Jiao Tong University) · Yingyi Zhang (Tencent Youtu Lab) · Siming Huang (Duke University) · Ran Yi (Shanghai Jiao Tong University) · Ke Fan (Shanghai Jiaotong University) · Ruixin Zhang (Tencent Youtu Lab) · Peixian Chen (Xiamen University) · Jun Wang (None) · Shouhong Ding (Tencent Youtu Lab) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
DiffLoc: Diffusion Model for Outdoor LiDAR Localization
Wen Li (schoold of informatics xiamen university) · Yuyang Yang (Xiamen University) · Shangshu Yu (Xiamen University) · Guosheng Hu (Oosto) · Chenglu Wen (Xiamen University) · Ming Cheng (Xiamen University) · Cheng Wang (Xiamen University)
Fairy: Fast Parallellized Instruction-Guided Video-to-Video Synthesis
Bichen Wu (Facebook) · Ching-Yao Chuang (Meta) · Xiaoyan Wang (Massachusetts Institute of Technology) · Yichen Jia (Facebook) · Kapil Krishnakumar (Meta, Inc.) · Tong Xiao (None) · Feng Liang (The University of Texas at Austin) · Licheng Yu (None) · Peter Vajda (Facebook)
An Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing
Feiran Hu (Nanjing University of Science and Technology) · Chenlin Zhang (Moonshot AI, Ltd) · Jiangliang GUO (www.ainnovation.com) · Xiu-Shen Wei (Nanjing University of Science and Technology) · Lin Zhao (Nanjing University of Science and Technology) · Anqi Xu (University of Toronto) · Lingyan Gao (AInnovation Lab)
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
Xiao Ma (SEA AI Lab) · Sumit Patidar (Dyson) · Iain Haughton (Dyson Ltd) · Stephen James (Dyson)
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Jingyuan Yang (Shenzhen University) · Jiawei Feng (Shenzhen University) · Hui Huang (Shenzhen University)
Selective, Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Filip Ilic (Graz University of Technology) · He Zhao (York University) · Thomas Pock (Graz University of Technology) · Richard P. Wildes (York University)
LEMON: Learning 3D Human-Object Interaction Relation from 2D Images
Yuhang Yang (University of Science and Technology of China) · Wei Zhai (University of Science and Technology of China) · Hongchen Luo (University of Science and Technology of China) · Yang Cao (University of Science and Technology of China) · Zheng-Jun Zha (University of Science and Technology of China)
Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection
Zhanwei Zhang (None) · Minghao Chen (Zhejiang University) · Shuai Xiao (Alibaba Group) · Liang Peng (FABU Inc) · Hengjia Li (FABU Inc) · Binbin Lin (Zhejiang University) · Ping Li (Hangzhou Dianzi University) · Wenxiao Wang (Zhejiang University) · Boxi Wu (Zhejiang University) · Deng Cai (Zhejiang University)
Prompt-enhanced Multiple Instance Learning for Weakly Supervised Anomaly Detection
Junxi Chen (None) · Liang Li (None) · Li Su (University of Chinese Academy of Sciences) · Zheng-Jun Zha (University of Science and Technology of China) · Qingming Huang (University of Chinese Academy of Sciences)
Brush2Prompt: Contextual Prompt Generator for Object Inpainting
Mang Tik Chiu (University of Illinois, Urbana Champaign) · Yuqian Zhou (University of Illinois, Urbana-Champaign) · Lingzhi Zhang (School of Engineering and Applied Science, University of Pennsylvania) · Zhe Lin (Adobe Research) · Connelly Barnes (Adobe Systems) · Sohrab Amirghodsi (Adobe) · Eli Shechtman (Adobe) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tongjia Chen (Hunan University) · Hongshan Yu (Hunan University) · Zhengeng Yang (Hunan University) · Zechuan Li (Hunan University) · Wei Sun (Hunan University) · Chen Chen ()
Condition-Aware Neural Network for Controlled Image Generation
Han Cai (Massachusetts Institute of Technology) · Muyang Li (None) · Qinsheng Zhang (Georgia Institute of Technology) · Ming-Yu Liu (NVIDIA) · Song Han (Massachusetts Institute of Technology)
EvDiG: Event-guided Direct and Global Components Separation
xinyu zhou (Peking University) · Peiqi Duan (None) · Boyu Li (Peking University) · Chu Zhou (Peking University) · Chao Xu (Peking University) · Boxin Shi (Peking University)
Sparse views, Near light: A practical paradigm for uncalibrated point-light photometric stereo
Mohammed Brahimi (Technische Universität München) · Bjoern Haefner (Technical University Munich) · Zhenzhang Ye (Technische Universität München) · Bastian Goldluecke (University of Konstanz) · Daniel Cremers (Technical University Munich)
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer
Yuan Dong (Alibaba Group) · Chuan Fang (Hong Kong University of Science and Technology) · Liefeng Bo (None) · Zilong Dong (Alibaba Group) · Ping Tan (Hong Kong University of Science and Technology)
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo (DAMO Academy, Alibaba Group) · Yufan Shen (Zhejiang University) · Zhaoqing Zhu (Alibaba Group) · Qi Zheng (Alibaba Group) · Zhi Yu (Zhejiang University) · Cong Yao (Alibaba DAMO Academy)
PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation
Ruining Deng (Vanderbilt University) · Quan Liu (Vanderbilt University) · Can Cui (Vanderbilt University) · Tianyuan Yao (Vanderbilt University) · Jialin Yue (Vanderbilt University) · Juming Xiong (Vanderbilt University) · Lining yu (Vanderbilt University) · Yifei Wu (Vanderbilt University) · Mengmeng Yin (Vanderbilt University) · Yu Wang (Vanderbilt University Medical Center) · Shilin Zhao (Vanderbilt University) · Yucheng Tang (NVIDIA) · Haichun Yang (Vanderbilt Unversity Medical School) · Yuankai Huo (Vanderbilt University)
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu (None) · Xintao Wang (Tencent) · Yixiao Ge (Tencent) · Ying Shan (Tencent) · Mike Zheng Shou (National University of Singapore)
SHINOBI: SHape and Illumination using Neural Object decomposition via BRDF optimization and Inverse rendering from unconstrained Image collections
Andreas Engelhardt (University of Tübingen) · Amit Raj (Google ) · Mark Boss (Stability AI) · Yunzhi Zhang (Stanford University) · Abhishek Kar (Google) · Yuanzhen Li (Massachusetts Institute of Technology) · Ricardo Martin-Brualla (Google) · Jonathan T. Barron (Google) · Deqing Sun (Google) · Hendrik Lensch (University of Tübingen) · Varun Jampani (Google Research)
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li (Tencent) · Qiang Nie (Tencent Youtu Lab) · Weifu Fu (Tencent Youtu Lab) · Yuhuan Lin (Tencent Youtu Lab) · Guangpin Tao (Tencent YoutuLab) · Yong Liu (Tencent Youtu Lab) · Chengjie Wang (Tencent Youtu Lab; Shanghai Jiao Tong University)
FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha (Rochester Institute of Technology) · Yang Zou (Amazon) · Qiuyu Chen (Amazon) · Zhiheng Li (Amazon AGI) · Yusheng Xie (Amazon) · Siqi Deng (Amazon)
Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation
Mohammad Amin Shabani (Simon Fraser University) · Zhaowen Wang (Adobe Research) · Difan Liu (Adobe Research) · Nanxuan Zhao (Adobe Research) · Jimei Yang (Adobe Research) · Yasutaka Furukawa (Simon Fraser University)
DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
Weiyi Lv (Shanghai University) · Yuhang Huang (National University of Defense Technology) · NING Zhang (PAII Inc.) · Ruei-Sung Lin (PAII Inc) · Mei Han (PAII Inc.) · Dan Zeng (Shanghai University)
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng (Nanyang Technological University) · Hang Zhang (Sichuan University) · Guanzheng Chen (SUN YAT-SEN UNIVERSITY) · Xin Li (Alibaba Group) · Shijian Lu (Nanyang Technological University) · Chunyan Miao (School of Computer Science and Engineering, Nanyang Technological University) · Lidong Bing (Alibaba DAMO Academy)
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Yixun Liang (Hong Kong University of Science and Technology) · Xin Yang (The Hong Kong University of Science and Technology) · Jiantao Lin (Hong Kong University of Science and Technology) · Haodong LI (Hong Kong University of Science and Technology) · Xiaogang Xu (Zhejiang Lab) · Ying-Cong Chen (The Hong Kong University of Science and Technology)
Preserving Fairness Generalization in Deepfake Detection
Li Lin () · Xinan He (Nanchang University) · Yan Ju (State University of New York at Buffalo) · Xin Wang (State University of New York at Albany) · Feng Ding (Nanchang University) · Shu Hu (Purdue University)
Gradient Alignment for Cross-domain Face Anti-Spoofing
MINH BINH LE (Sungkyunkwan University ( ˘▽˘)っ♨) · Simon Woo (Sungkyunkwan University)
Multi-Object Tracking in the Dark
Xinzhe Wang (Beijing Institute of Technology) · Kang Ma (Beijing Institute of Technology) · Qiankun Liu (Beijing Institute of Technology) · Yunhao Zou (None) · Ying Fu (None)
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning
Sikai Bai (The Hong Kong University of Science and Technology) · Jie ZHANG (The Hong Kong Polytechnic University) · Song Guo (Department of Computer Science and Engineering, Hong Kong University of Science and Technology) · Shuaicheng Li (Sensetime Group Limited) · Jingcai Guo (The Hong Kong Polytechnic University) · Jun Hou (Sensetime) · Tao Han (Northwestern Polytechnical University) · Xiaocheng Lu (Northwestern Polytechnical University)
Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models
Xingqian Xu (University of Illinois, Urbana Champaign) · Jiayi Guo (Tsinghua University, Tsinghua University) · Zhangyang Wang (University of Texas at Austin) · Gao Huang (Tsinghua University, Tsinghua University) · Irfan Essa (Georgia Institute of Technology) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
Holistic Features are almost Sufficient for Text-to-Video Retrieval
Kaibin Tian (None) · Ruixiang Zhao (None) · Zijie Xin (Sichuan University) · Bangxiang Lan (Renmin University of China) · Xirong Li (Renmin University of China)
HDQMF: Holographic Feature Decomposition Using Quantum Algorithms
Prathyush Poduval (University of California, Irvine) · Zhuowen Zou (University of California, Irvine) · Mohsen Imani (University of California, Irvine)
Rethinking Boundary Discontinuity Problem for Oriented Object Detection
Hang Xu (Hangzhou Dianzi University) · Xinyuan Liu (Institute of Computing Technology, Chinese Academy of Sciences) · Haonan Xu (ICT, Chinese Academy of Sciences) · Yike Ma (, Chinese Academy of Sciences) · Zunjie Zhu (Hangzhou Dianzi University) · Chenggang Yan (Hangzhou Dianzi University, Tsinghua University) · Feng Dai (ICT, Chinese Academy of Sciences)
Fair-VPT: Fair Visual Prompt Tuning for Image Classification
Sungho Park (Yonsei university) · Hyeran Byun (Yonsei University)
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim (Kyung Hee University) · Sung Jin Um (Kyung Hee University) · Sangmin Lee (University of Illinois Urbana-Champaign) · Jung Uk Kim (Kyung Hee University)
Task-conditioned adaptation of visual features in multi-task policy learning
Pierre Marza (Institut National des Sciences Appliquées de Lyon) · Laetitia Matignon (LIRIS, CNRS) · Olivier Simonin (INSA de Lyon) · Christian Wolf (Naver Labs Europe)
Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers
Jinxia Xie (Guangxi Normal University) · Bineng Zhong (Guangxi Normal University) · Zhiyi Mo (Wuzhou university) · Shengping Zhang (Harbin Institute of Technology) · Liangtao Shi (Guangxi Normal University) · Shuxiang Song (Guangxi Normal University) · Rongrong Ji (Xiamen University)
Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
Jaewoo Jeong (KAIST) · Daehee Park (KAIST) · Kuk-Jin Yoon (KAIST)
Revisiting Single Image Reflection Removal In the Wild
Yurui Zhu (University of Science and Technology of China) · Bo Li (vivo Mobile Communication Co.,Ltd.) · Xueyang Fu (University of Science and Technology of China) · Peng-Tao Jiang (vivo Mobile Communication (Hangzhou) Co., Ltd.) · Hao Zhang (vivo Mobile Communication (Hangzhou)Co., Ltd) · Qibin Sun (University of Science and Technology of China) · Zheng-Jun Zha (University of Science and Technology of China) · Jinwei Chen (vivo Mobile Communication Co., Ltd.)
Augmented Identity Distraction for Face Anonymization
Zhenzhong Kuang (Hangzhou Dianzi University) · Xiaochen Yang (Hangzhou Dianzi University) · Yingjie Shen (Hangzhou Dianzi University) · Chao Hu (Hangzhou Dianzi University) · Jun Yu (Hangzhou Dianzi University)
NeuRAD: Neural Rendering for Autonomous Driving
Adam Tonderski (Lund University) · Carl Lindström (Chalmers University of Technology) · Georg Hess (Chalmers University of Technology) · William Ljungbergh (Linköping University Zenseact) · Lennart Svensson (Chalmers University of Technology) · Christoffer Petersson (Zenseact)
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
Yi Yu (Southeast University) · Xue Yang (Shanghai AI Laboratory) · Qingyun Li (Harbin Institute of Technology) · Feipeng Da (Southeast University) · Jifeng Dai (Tsinghua University, Tsinghua University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Junchi Yan (Shanghai Jiao Tong University)
TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations
Bo Sun (University of Texas, Austin) · Thibault Groueix (Adobe Systems) · Chen Song (University of Texas at Austin) · Qixing Huang (University of Texas at Austin) · Noam Aigerman (Université de Montréal)
Poly Kernel Inception Network for Remote Sensing Detection
Xinhao Cai (Nanjing University of Science and Technology) · Qiuxia Lai (Communication University of China) · Yuwei Wang (Nanjing University of Science and Technology) · Wenguan Wang (Zhejiang University) · Zeren Sun (Nanjing University of Science and Technology) · Yazhou Yao (Nanjing University of Science and Technology)
Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes
Hmrishav Bandyopadhyay (University of Surrey) · Subhadeep Koley (University of Surrey) · Ayan Das (University of Surrey) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Aneeshan Sain (University of Surrey) · Pinaki Nath Chowdhury (University of Surrey) · Tao Xiang (University of Surrey) · Yi-Zhe Song (None)
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
Mengyao Lyu (Tsinghua University) · Yuhong Yang () · Haiwen Hong (Alibaba Group) · Hui Chen (Tsinghua University, Tsinghua University) · Xuan Jin (University of Science and Technology of China) · Yuan He (Alibaba Group) · Hui Xue (Zhejiang University, Tsinghua University) · Jungong Han (Aberystwyth University) · Guiguang Ding (Tsinghua University)
Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence
Ripon Saha (Arizona State University) · Dehao Qin (Clemson University) · Nianyi Li (None) · Jinwei Ye (None) · Suren Jayasuriya (Arizona State University)
BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning
Hongwei Zheng (Meituan) · Linyuan Zhou (meituan) · Han Li (Shanghai Jiaotong University) · Jinming Su (Meituan) · Xiaoming Wei (Meituan) · Xu Xiaoming (meituan)
IIRP-Net: Iterative Inference Residual Pyramid Network for Enhanced Image Registration
Tai Ma (East China Normal University) · zhangsuwei (East China Normal University) · Jiafeng Li (East China Normal University) · Ying Wen (East China Normal University)
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Ziyu Wang (Shanghai Jiao Tong University) · Yue Xu (Shanghai Jiao Tong University) · Cewu Lu (Shanghai Jiao Tong University) · Yonglu Li (Shanghai Jiaotong University)
Frequency-Adaptive Dilated Convolution for Semantic Segmentation