Skip to yearly menu bar Skip to main content


CVPR 2024 Accepted Papers

Papers are assigned to poster sessions such that topics are maximally spread over sessions (attendees will find interesting papers at each session) while grouping similar posters within each poster session to minimize walking distances. We used a 1D t-SNE projection of the SPECTER paper embeddings to realize this assignment.

This page is cached for 1 hour.  Changes to affiliation or name in your local profile may take up to 60 minutes to appear here.

Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Jianwu Fang (Xi'an Jiaotong University) · Lei-lei Li (Chang'an university) · Junfei Zhou (Chang'an university) · Junbin Xiao (None) · Hongkai Yu (Cleveland State University) · Chen Lv (Nanyang Technological University) · Jianru Xue (Xi'an Jiaotong University, Tsinghua University) · Tat-seng Chua (National University of Singapore)
FedHCA$^2$: Towards Hetero-Client Federated Multi-Task Learning
Yuxiang Lu (Shanghai Jiao Tong University) · Suizhi Huang (Shanghai Jiao Tong University) · Yuwen Yang (Shanghai Jiao Tong University) · Shalayiding Sirejiding () · Yue Ding (Shanghai Jiao Tong University) · Hongtao Lu (Shanghai Jiao Tong University)
Re-thinking Data Availability Attacks Against Deep Neural Networks
Bin Fang (Shanghai Jiao Tong University) · Bo Li (vivo Mobile Communication Co.,Ltd.) · Shuang Wu (Tencent YouTu Lab) · Shouhong Ding (Tencent Youtu Lab) · Ran Yi (Shanghai Jiao Tong University) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching
Lennart Bastian (None) · Yizheng Xie (Technische Universität München) · Nassir Navab (TU Munich) · Zorah Lähner (Rheinische Friedrich-Wilhelms Universität Bonn)
Revamping Federated Learning Security from a Defender's Perspective: A Unified Defense with Homomorphic Encrypted Data Space
Naveen Kumar Kummari (Indian Institute of Technology Hyderabad, India) · Reshmi Mitra (Southeast Missouri State University) · Krishna Mohan Chalavadi (Indian Institute of Technology Hyderabad)
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
Bor Shiun Wang (National Yang Ming Chiao Tung University) · Chien-Yi Wang (NVIDIA) · Wei-Chen Chiu (None)
3D Feature Tracking via Event Camera
Siqi Li (Tsinghua University) · Zhou Zhikuan (None) · Zhou Xue (Li Auto) · Yipeng Li (Tsinghua University, Tsinghua University) · Shaoyi Du (Xi'an Jiaotong University) · Yue Gao (Tsinghua University, Tsinghua University)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling
Jianan Fan (University of Sydney) · Dongnan Liu (University of Sydney) · Hang Chang (Lawrence Berkeley National Lab) · Heng Huang (University of Pittsburgh) · Mei Chen () · Weidong Cai (The University of Sydney)
Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens
Zhiwen Chen (Xidian University) · Zhiyu Zhu (City University of Hong Kong) · Yifan Zhang (City University of Hong Kong) · Junhui Hou (City University of Hong Kong) · Guangming Shi (Xidian University) · Jinjian Wu (Xidian University)
DETRs Beat YOLOs on Real-time Object Detection
Yian Zhao (Peking University) · Wenyu Lv (Baidu) · Shangliang Xu (Baidu) · Jinman Wei (Tianjin University) · Guanzhong Wang (Baidu) · Qingqing Dang (Baidu) · Yi Liu (None) · Jie Chen (Peking University)
Neural Clustering based Visual Representation Learning
Guikun Chen (Zhejiang University) · Xia Li (Department of Computer Science, ETH Zurich) · Yi Yang (Zhejiang University) · Wenguan Wang (Zhejiang University)
I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions
Chengfeng Zhao (ShanghaiTech University) · Juze Zhang (ShanghaiTech University) · Jiashen Du (None) · Ziwei Shan (ShanghaiTech University) · Junye Wang (ShanghaiTech University) · Jingyi Yu (Shanghai Tech University) · Jingya Wang (ShanghaiTech University) · Lan Xu (ShanghaiTech University)
AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
Zixiang Zhou (xiaobing.ai) · Yu Wan () · Baoyuan Wang (Xiaobing.ai)
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang (Fudan University) · Dingkang Yang (Fudan University) · Zhaoyu Chen (Fudan University) · Yang Liu (Fudan University) · Siao Liu (Fudan University) · Wenqiang Zhang (None) · Lihua Zhang (Fudan University) · Lizhe Qi (Fudan University)
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
Mengqi Zhang (University of California, San Diego) · Yang Fu (University of California San Diego) · Zheng Ding (University of California, San Diego) · Sifei Liu (NVIDIA) · Zhuowen Tu (University of California, San Diego) · Xiaolong Wang (UCSD)
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang (Tsinghua University, Tsinghua University) · Jian Tao (Tsinghua University, Tsinghua University) · Jiafei Lyu (Tsinghua University, Tsinghua University) · Chunjiang Ge (Control science and technology, Tsinghua University, Tsinghua University) · Jiaxin Chen (Parametrix.ai) · Weihan Shen (Parametrix) · Xiaolong Zhu (Parametrix) · Xiu Li (Tsinghua University)
Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts
Jiayi Chen (Northwestern Polytechnical University) · Benteng Ma (Hong Kong University of Science and Technology) · Hengfei Cui (Northwest Polytechnical University Xi'an) · Kwang-Ting Cheng (Hong Kong University of Science and Technology) · Yong Xia (Northwestern Polytechnical University)
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng (ByteDance) · Chao Wen (ByteDance) · Zhuo Su (ByteDance) · Zeran Xu (Bytedance) · Zhaohu Li (ByteDance) · Yang Zhao (ByteDance) · Zhou Xue (Li Auto)
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Xinyao Li (University of Electronic Science and Technology of China) · Yuke Li (Wuhan University) · Zhekai Du (University of Electronic Science and Technology of China) · Fengling Li (University of Technology Sydney) · Ke Lu (University of Electronic Science and Technology of China) · Jingjing Li (University of Electronic Science and Technology of China)
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
Qing Yu (LY Corporation) · Mikihiro Tanaka (LY Corporation) · Kent Fujiwara (LY Corporation)
Structure-Guided Adversarial Training of Diffusion Models
Ling Yang (Peking University) · Haotian Qian (Peking University) · Zhilong Zhang (Peking University) · Jingwei Liu (Peking University) · Bin CUI (Peking University)
MonoHair: High-Fidelity Hair Modeling from a Monocular Video
Keyu Wu (Zhejiang University) · LINGCHEN YANG (ETHZ - ETH Zurich) · Zhiyi Kuang (Zhejiang University) · Yao Feng (None) · Xutao Han (Zhejiang University) · Yuefan Shen (Zhejiang University) · Hongbo Fu (City University of Hong Kong) · Kun Zhou (Zhejiang University) · Youyi Zheng (Zhejiang University)
CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images
Changsheng Chen (Shenzhen University) · Liangwei Lin (Shenzhen University) · Yongqi Chen (Shenzhen University) · Bin Li (Shenzhen University) · Jishen Zeng (Alibaba Group) · Jiwu Huang (Shenzhen University)
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Christen Millerdurai (Max Planck Institute for Informatics) · Hiroyasu Akada (Max Planck Institute for Informatics) · Jian Wang (Max Planck Institute for Informatics) · Diogo Luvizon (Saarland Informatics Campus, Max-Planck Institute) · Christian Theobalt (MPI Informatik) · Vladislav Golyanik (MPI for Informatics)
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song () · Wenhao Chai (University of Washington) · Guanhong Wang (Zhejiang University) · Haoyang Zhou (Zhejiang University) · Feiyang Wu (Zhejiang University) · Yucheng Zhang (Zhejiang University) · Tian Ye (Hong Kong University of Science and Technology, Guangzhou Campus) · Haozhe Chi (Zhejiang University) · Xun Guo (Microsoft Research Asia) · Yanting Zhang (Donghua University, Shanghai) · Yan Lu (Microsoft Research Asia) · Jenq-Neng Hwang (None) · Gaoang Wang (Zhejiang University)
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning
Da-Wei Zhou (Nanjing University) · Hai-Long Sun (Nanjing University) · Han-Jia Ye (Nanjing University) · De-Chuan Zhan (Nanjing University)
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman (University of Texas at Austin) · Andrew Westbury (Facebook AI Research) · Lorenzo Torresani (Facebook) · Kris Kitani (Carnegie Mellon University) · Jitendra Malik (University of California at Berkeley) · Triantafyllos Afouras (University of Oxford) · Kumar Ashutosh (UT Austin & FAIR, Meta) · Vijay Baiyya (University of Louisiana at Lafayette) · Siddhant Bansal (University of Bristol, UK) · Bikram Boote (University of Illinois, Urbana Champaign) · Eugene Byrne (Meta) · Zachary Chavis (University of Minnesota) · Joya Chen (National University of Singapore) · Feng Cheng (University of North Carolina at Chapel Hill) · Fu-Jen Chu (Facebook) · Sean Crane (School of Computer Science, Carnegie Mellon University) · Avijit Dasgupta (IIIT Hyderabad) · Jing Dong (Meta) · Maria Escobar (Universidad de Los Andes) · Cristhian David Forigua Diaz (Reblink) · Abrham Gebreselasie (Carnegie Mellon University) · Sanjay Haresh (Qualcomm Inc, QualComm) · Jing Huang (Facebook) · Md Mohaiminul Islam (UNC Chapel Hill) · Suyog Jain (Meta) · Rawal Khirodkar (Meta) · Devansh Kukreja (Carnegie Mellon University) · Kevin Liang (FAIR at Meta) · Jia-Wei Liu (National University of Singapore) · Sagnik Majumder (UT Austin & Meta AI) · Yongsen Mao (Simon Fraser University) · Miguel Martin (Meta Platforms, Inc.) · Effrosyni Mavroudi () · Tushar Nagarajan (Meta) · Francesco Ragusa (None) · Santhosh Kumar Ramakrishnan (University of Texas, Austin) · Luigi Seminara (University of Catania) · Arjun Somayazulu (University of Texas at Austin) · Yale Song (Meta) · Shan Su (University of Pennsylvania) · Zihui Xue (None) · Edward Zhang (University of Pennsylvania, University of Pennsylvania) · Jinxu Zhang (University of Pennsylvania, University of Pennsylvania) · Angela Castillo (Universidad de Los Andes) · Changan Chen (University of Texas at Austin) · Fu Xinzhu (National University of Singapore) · Ryosuke Furuta (The University of Tokyo) · Cristina González (Universidad de Los Andes) · Gupta (None) · Jiabo Hu (Facebook) · Yifei Huang (The University of Tokyo) · Yiming Huang (University of Pennsylvania) · Weslie Khoo (Indiana University) · Anush Kumar (Torc Robotics) · Robert Kuo (Facebook) · Sach Lakhavani (None) · Miao Liu (META AI) · Mi Luo (The University of Texas at Austin) · Zhengyi Luo (Carnegie Mellon University) · Brighid Meredith (meta) · Austin Miller (Meta) · Oluwatumininu Oguntola (University of North Carolina at Chapel Hill) · Xiaqing Pan (Meta) · Penny Peng (Meta) · Shraman Pramanick (None) · Merey Ramazanova (KAUST) · Fiona Ryan (Georgia Institute of Technology) · Wei Shan (University of North Carolina at Chapel Hill) · Kiran Somasundaram (None) · Chenan Song (national university of singaore, National University of Singapore) · Audrey Southerland (Georgia Institute of Technology) · Masatoshi Tateno (AIST, National Institute of Advanced Industrial Science and Technology) · Huiyu Wang (Facebook) · Yuchen Wang (Indiana University) · Takuma Yagi (None) · Mingfei Yan (None) · Xitong Yang (Meta) · Zecheng Yu (University of Tokyo) · Shengxin Zha (Meta GenAI) · Chen Zhao (King Abdullah University of Science and Technology (KAUST)) · Ziwei Zhao (Indiana University) · Zhifan Zhu (University of Bristol) · Jeff Zhuo (University of North Carolina at Chapel Hill) · Pablo ARBELAEZ (Universidad de los Andes) · Gedas Bertasius (UNC Chapel Hill) · Dima Damen (University of Bristol and Google DeepMind) · Jakob Engel (Research, Meta Reality Labs) · Giovanni Maria Farinella (University of Catania, Italy) · Antonino Furnari (University of Catania) · Bernard Ghanem (KAUST) · Judy Hoffman (Georgia Institute of Technology) · C.V. Jawahar (IIIT-Hyderabad) · Richard Newcombe (Meta, Reality Labs Research) · Hyun Soo Park (The University of Minnesota) · James Rehg (None) · Yoichi Sato (University of Tokyo) · Manolis Savva (Simon Fraser University) · Jianbo Shi (None) · Mike Zheng Shou (National University of Singapore) · Michael Wray (University of Bristol)
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna (Georgia Institute of Technology) · Yongsen Mao (Simon Fraser University) · Hanxiao Jiang (University of Illinois Urbana-Champaign) · Sanjay Haresh (Qualcomm Inc, QualComm) · Brennan Shacklett (Stanford University) · Dhruv Batra (FAIR (Meta) and Georgia Tech) · Alexander William Clegg (Meta AI) · Eric Undersander (Meta) · Angel Xuan Chang (Simon Fraser University) · Manolis Savva (Simon Fraser University)
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Xiangyang Zhu (City University of Hong Kong) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Bowei He (City University of Hong Kong) · Ziyu Guo (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Jiaming Liu (Peking University) · Han Xiao (The Chinese University of Hong Kong & Shanghai AI Laboratory) · Chaoyou Fu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Hao Dong (None) · Peng Gao (The Chinese University of Hong Kong)
Brain Decodes Deep Nets
Huzheng Yang (University of Pennsylvania, University of Pennsylvania) · James Gee (University of Pennsylvania, University of Pennsylvania) · Jianbo Shi (None)
Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain
Qunliang Xing (Beihang University) · Mai Xu (Beihang University, Tsinghua University) · Shengxi Li (Beihang University) · Xin Deng (Beijing University of Aeronautics and Astronautics) · Meisong Zheng (Alibaba Group) · huaida liu (Alibaba Group) · Ying Chen (Alibaba Group)
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Yabin Zhang (The Hong Kong Polytechnic University) · Wenjie Zhu (None) · Hui Tang (Hong Kong University of Science and Technology) · Zhiyuan Ma (None) · Kaiyang Zhou (Hong Kong Baptist University) · Lei Zhang (The Hong Kong Polytechnic University)
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Yaofang Liu (City University of Hong Kong) · Xiaodong Cun (Tencent AI Lab) · Xuebo Liu (Harbin Institute of Technolgy, Shenzhen) · Xintao Wang (Tencent) · Yong Zhang (Tencent AI Lab) · Haoxin Chen (Tencent AI Lab) · Yang Liu (National University of Defense Technology) · Tieyong Zeng (The Chinese University of Hong Kong) · Raymond Chan (City University of Hong Kong) · Ying Shan (Tencent)
PHYSCENE: Physically Interactable 3D Scene Synthesis for Embodied AI
Yandan Yang (Beijing Institute for General Artificial Intelligence) · Baoxiong Jia (Beijing Institute for General Artificial Intelligence (BIGAI)) · Peiyuan Zhi (Beijing Institute for General Artificial Intelligence) · Siyuan Huang (Beijing Institute of General Artificial Intelligence)
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Yanhui Wang (None) · Jianmin Bao (Microsoft) · Wenming Weng (None) · Ruoyu Feng (University of Science and Technology of China) · Dacheng Yin (University of Science and Technology of China) · Tao Yang (Xi'an JiaoTong University) · Jingxu Zhang (Research, Microsoft) · Qi Dai (Microsoft Research Asia) · Zhiyuan Zhao (Tencent) · Chunyu Wang (Microsoft) · Kai Qiu (Microsoft) · Yuhui Yuan (Microsoft Research Asia) · Xiaoyan Sun (University of Science and Technology of China) · Chong Luo (Microsoft Research Asia) · Baining Guo (Microsoft Research)
PH-Net: Semi-Supervised Breast Lesion Segmentation via Patch-wise Hardness
Siyao Jiang (Shenzhen University) · Huisi Wu (Shenzhen University) · Junyang Chen () · Qin Zhang (Shenzhen University) · Jing Qin (Hong Kong Polytechnic University)
Towards Text-guided 3D Scene Composition
Qihang Zhang (The Chinese University of Hong Kong) · Chaoyang Wang (Snap Inc) · Aliaksandr Siarohin (Snap Inc.) · Peiye Zhuang (Snap Inc.) · Yinghao Xu (Chinese University of Hong Kong) · Ceyuan Yang (The Chinese University of Hong Kong) · Dahua Lin (The Chinese University of Hong Kong) · Bolei Zhou (University of California, Los Angeles) · Sergey Tulyakov (Snap Inc.) · Hsin-Ying Lee (Snap Inc.)
MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation
Xiaolong Deng (Shenzhen University) · Huisi Wu (Shenzhen University) · Runhao Zeng (Shenzhen MSU-BIT University) · Jing Qin (Hong Kong Polytechnic University)
BioCLIP: A Vision Foundation Model for the Tree of Life
Samuel Stevens (Ohio State University, Columbus) · Jiaman Wu (Ohio State University, Columbus) · Matthew Thompson (Ohio State University, Columbus) · Elizabeth Campolongo (The Ohio State University) · Chan Hee Song (The Ohio State University) · David Carlyn (Ohio State University) · Li Dong (Microsoft Research) · Wasila Dahdul (University of California, Irvine) · Charles Stewart (Rensselaer Polytechnic Institute) · Tanya Berger-Wolf (Ohio State University) · Wei-Lun Chao (Ohio State University) · Yu Su (The Ohio State University)
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil (Ohio State University, Columbus) · Chan Hee Song (The Ohio State University) · Boyuan Zheng (Ohio State University, Columbus) · Xiang Deng (Google) · Yu Su (The Ohio State University) · Wei-Lun Chao (Ohio State University)
Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
Haipeng Liu (Hefei University of Technology) · Yang Wang (Hefei University of Technology) · Biao Qian (None) · Meng Wang (Hefei University of Technology) · Yong Rui (Lenovo)
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Nikhil Keetha (Carnegie Mellon University) · Jay Karhade (Carnegie Mellon University) · Krishna Murthy Jatavallabhula (Massachusetts Institute of Technology) · Gengshan Yang (Meta) · Sebastian Scherer (None) · Deva Ramanan (Carnegie Mellon University) · Jonathon Luiten (RWTH Aachen University)
OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition
Yuchen Pan (Harbin Institute of Technology) · Junjun Jiang (Harbin Institute of Technology) · Kui Jiang (Harbin Institute of Technology) · Zhihao Wu (Harbin Institute of Technology, Shenzhen) · Keyuan Yu (Harbin Institute of Technology) · Xianming Liu (Harbin Institute of Technology)
Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness Sampling
Jianan Li (University of Chinese Academy of Sciences) · Qiulei Dong (Institute of Automation, Chinese Academy of Sciences)
From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation
Javier Tirado-Garín (I3A, Universidad de Zaragoza) · Javier Civera (I3A, Universidad de Zaragoza)
FADES: Fair Disentanglement with Sensitive Relevance
Taeuk Jang (Purdue University) · Xiaoqian Wang (Purdue University)
Visual Anagrams: Synthesizing Multi-View Optical Illusions with Diffusion Models
Daniel Geng (University of Michigan) · Inbum Park (University of Michigan) · Andrew Owens (University of Michigan)
SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology
Saarthak Kapse (State University of New York at Stony Brook) · Pushpak Pati (International Business Machines) · Srijan Das (University of North Carolina at Charlotte) · Jingwei Zhang (None) · Chao Chen (State University of New York, Stony Brook) · Maria Vakalopoulou (CentraleSupelec) · Joel Saltz (State University of New York at Stony Brook) · Dimitris Samaras (Stony Brook University) · Rajarsi Gupta (Academic medical center at State University of New York at Stony Brook) · Prateek Prasanna (State University of New York, Stony Brook)
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma (Purdue University) · Can Cui (Purdue University) · Xu Cao (University of Illinois Urbana-Champaign) · Wenqian Ye (University of Virginia) · Peiran Liu (Purdue University) · Juanwu Lu (Purdue University) · Amr Abdelraouf (None) · Rohit Gupta (Toyota Motor Corporation) · Kyungtae Han (Toyota Motor North America) · Aniket Bera (Purdue University) · James Rehg (None) · Ziran Wang (Purdue University)
Low-Latency Neural Stereo Streaming
Qiqi Hou (Qualcomm Inc, QualComm) · Farzad Farhadzadeh (Qualcomm Inc, QualComm) · Amir Said (Qualcomm Inc, QualComm) · Guillaume Sautiere (Qualcomm Inc, QualComm) · Hoang Le (Qualcomm AI Research)
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye (Alibaba Group) · Haiyang Xu (Alibaba Group) · Jiabo Ye (East China Normal University) · Ming Yan (Alibaba Group) · Anwen Hu (Alibaba Group) · Haowei Liu (Institute of Automation, Chinese Academy of Sciences) · Qi Qian (Alibaba Group) · Ji Zhang (Alibaba Group) · Fei Huang (Alibaba Group)
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Huan Liu (Beijing Jiaotong University) · Zichang Tan (Baidu) · Chuangchuang Tan (Beijing Jiaotong University) · Yunchao Wei (Beijing Jiaotong University) · Jingdong Wang (Baidu) · Yao Zhao (Beijing Jiaotong University)
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan (Alibaba Group) · Sibo Song (Alibaba Group) · Wenwen Yu (Huazhong University of Science and Technology) · Yuliang Liu (Huazhong University of Science and Technology) · Wenqing Cheng (Huazhong University of Science and Technology) · Fei Huang (Alibaba Group) · Xiang Bai (Huazhong University of Science and Technology) · Cong Yao (Alibaba DAMO Academy) · Zhibo Yang (Alibaba Group)
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Jingxuan Xu (Beijing Jiaotong University) · Wuyang Chen (Simon Fraser University) · Yao Zhao (Beijing Jiaotong University) · Yunchao Wei (Beijing Jiaotong University)
LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction
Linqing Zhao (Tsinghua University) · Xiuwei Xu (Tsinghua University, Tsinghua University) · Ziwei Wang (Tsinghua University, Tsinghua University) · Yunpeng Zhang (PhiGent Robotics) · Borui Zhang (Tsinghua University, Tsinghua University) · Wenzhao Zheng (Tsinghua University, Tsinghua University) · Dalong Du (PhiGent Robotics) · Jie Zhou (None) · Jiwen Lu (Tsinghua University)
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen (MIT) · Zhuo Xu (Google Deepmind) · Sean Kirmani (Google DeepMind) · brian ichter (Google) · Dorsa Sadigh (Google) · Leonidas Guibas (Stanford University) · Fei Xia (Google)
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Peihao Wang (University of Texas, Austin) · Dejia Xu (University of Texas at Austin) · Zhiwen Fan (University of Texas, Austin) · Dilin Wang (Facebook) · Sreyas Mohan (Meta) · Forrest Iandola (Meta) · Rakesh Ranjan () · Yilei Li (Facebook) · Qiang Liu (University of Texas, Austin) · Zhangyang Wang (University of Texas at Austin) · Vikas Chandra (Facebook)
HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images
Xihe Yang (The Chinese University of Hong Kong, Shenzhen) · Xingyu Chen (Xiaobing.AI) · Daiheng Gao (Alibaba) · Finn Wong (Xiaobing.AI) · Xiaoguang Han (The Chinese University of Hong Kong, Shenzhen) · Baoyuan Wang (Xiaobing.ai)
Hearing Anything Anywhere
Mason Wang (Stanford University) · Ryosuke Sawata (Sony Research) · Samuel Clarke (Stanford University) · Ruohan Gao (Meta Reality Labs) · Shangzhe Wu (Stanford University) · Jiajun Wu (Stanford University)
Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai (Johns Hopkins University) · Xinyang Geng (Google) · Karttikeya Mangalam (University of California Berkeley) · Amir Bar (TAU / UC Berkeley) · Alan L. Yuille (Johns Hopkins University) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Jitendra Malik (University of California at Berkeley) · Alexei A. Efros (UC Berkeley)
Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing
Ling Lo (None) · Cheng Yeo (National Chiao Tung University) · Hong-Han Shuai (None) · Wen-Huang Cheng (National Taiwan University)
Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
Rui Song (Technical University of Munich) · Chenwei Liang (Fraunhofer) · Hu Cao (Technical University of Munich) · Zhiran Yan (Technische Hochschule Ingolstadt) · Walter Zimmer (Technical University of Munich (TUM)) · Markus Gross (Fraunhofer IVI) · Andreas Festag (Technische Hochschule Ingolstadt) · Alois Knoll (Technical University Munich)
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu (None) · Tianbin (None) · Quanfeng Lu (Shanghai AI Laboratory) · Wenqi Shao (The Chinese University of Hong Kong) · Junjun He (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Ping Luo (The University of Hong Kong)
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang (Zhejiang University) · Sha Zhang (None) · Di Huang (University of Sydney) · Xiaoyang Wu (The University of Hong Kong) · Haoyi Zhu (University of Science and Technology of China) · Tong He (Shanghai AI Lab) · SHIXIANG TANG (The Chinese University of Hong Kong) · Hengshuang Zhao (The University of Hong Kong) · Qibo Qiu (Zhejiang Lab) · Binbin Lin (Zhejiang University) · Xiaofei He (Zhejiang University) · Wanli Ouyang (University of Sydney)
Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation
Xu Zheng (HKUST) · Pengyuan Zhou (Aarhus University) · ATHANASIOS (ICT) · Lin Wang (Hong Kong University of Science and Technology)
In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging
Xin Wang (Beijing Institute of Technology) · Lizhi Wang (None) · Xiangtian Ma (Beijing Institute of Technology) · Maoqing Zhang (Beijing Institute of Technoloy) · Zhu Lin (None) · Hua Huang (Beijing Normal University)
MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling
Xuzhe Zhang (Columbia University) · Yuhao Wu (Duke University) · Elsa Angelini (Télécom ParisTech) · Ang Li (University of Maryland, College Park) · Jia Guo (Columbia University) · Jerod Rasmussen (University of California, Irvine) · Thomas O'Connor (University of Rochester) · Pathik Wadhwa (University of California, Irvine) · Andrea Jackowski (None) · Hai Li (Duke University) · Jonathan Posner (Duke University) · Andrew Laine (Columbia University) · YUN WANG (None)
Seeing Motion at Nighttime with an Event Camera
Haoyue Liu (Huazhong University of Science and Technology) · Shihan Peng (Huazhong University of Science and Technology) · Zhu Lin (None) · Yi Chang (Huazhong University of Science and Technology) · Hanyu Zhou (Huazhong University of Science and Technology) · Luxin Yan (Huazhong University of Science and Technology)
Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration
Chen Zhao (None) · Weiling Cai (Nanjing Normal University) · Chenyu Dong (Nanjing Normal University) · Chengwei Hu (Nanjing Normal University)
StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
Jeongho Kim (KAIST) · Gyojung Gu (Korea Advanced Institute of Science and Technology) · Minho Park (KAIST) · Sunghyun Park (Qualcomm Inc, QualComm) · Jaegul Choo (Korea Advanced Institute of Science and Technology)
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Hongxia Xie (Jilin University) · Chu-Jun Peng (National Yang Ming Chiao Tung University) · Yu-Wen Tseng (Department of computer science and informational engineering, National Taiwan University) · Hung-Jen Chen (National Yang Ming Chiao Tung University) · Chan-Feng Hsu (National Chiao Tung University) · Hong-Han Shuai (None) · Wen-Huang Cheng (National Taiwan University)
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
Xin Guo (Ant Group) · Jiangwei Lao (Ant Group) · Bo Dang (Wuhan University) · Yingying Zhang (Hikvision Research Institute) · Lei Yu (antgroup) · Lixiang Ru (Ant Group) · Liheng Zhong (Ant Group) · Ziyuan Huang (National University of Singapore) · Kang Wu (Wuhan University) · Dingxiang Hu (mybank) · HUIMEI HE (Ant Group) · Jian Wang (, Institute of automation, Chinese academy of science) · Jingdong Chen (Ant Group) · Ming Yang (Ant Group) · Yongjun Zhang (None) · Yansheng Li (Wuhan University)
Unifying Top-down and Bottom-up Scanpath Prediction using Transformers
Zhibo Yang (State University of New York, Stony Brook) · Sounak Mondal (State University of New York, Stony Brook) · Seoyoung Ahn (State University of New York, Stony Brook) · Ruoyu Xue (State University of New York at Stony Brook) · Gregory Zelinsky (State University of New York at Stony Brook) · Minh Hoai (State University of New York, Stony Brook) · Dimitris Samaras (Stony Brook University)
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Yunhao Ge (University of Southern California) · Yihe Tang (Stanford University) · Jiashu Xu (University of Southern California) · Cem Gokmen (Stanford University) · Chengshu Li (Stanford University) · Wensi Ai (Stanford University) · Benjamin Martinez (Stanford University) · Arman Aydin (Stanford University) · Mona Anvari (Computer Science Department, Stanford University) · Ayush Chakravarthy (Stanford University) · Hong-Xing Yu (Computer Science Department, Stanford University) · Josiah Wong (Stanford University) · Sanjana Srivastava (Stanford University) · Sharon Lee (Stanford University) · Shengxin Zha (Meta GenAI) · Laurent Itti (USC) · Yunzhu Li (University of Illinois Urbana-Champaign) · Roberto Martín-Martín (University of Texas at Austin) · Miao Liu (META AI) · Pengchuan Zhang (Meta AI) · Ruohan Zhang (Stanford University) · Li Fei-Fei (Stanford University) · Jiajun Wu (Stanford University)
Event-based Visible and Infrared Fusion via Multi-task Collaboration
Mengyue Geng (Peking University) · Zhu Lin (None) · Lizhi Wang (None) · Wei Zhang (Peng Cheng Laboratory) · Ruiqin Xiong (Peking University) · Yonghong Tian (Peking University)
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai (Department of Computer Science, University of Wisconsin, Madison) · Haotian Liu (University of Wisconsin-Madison) · Siva Mustikovela (Heidelberg University) · Gregory P. Meyer (Cruise) · Yuning Chai (Cruise) · Dennis Park (Toyota Research Institute) · Yong Jae Lee (Department of Computer Sciences, University of Wisconsin - Madison)
OpenEQA: Embodied Question Answering in the Era of Foundation Models
Arjun Majumdar (Georgia Institute of Technology) · Anurag Ajay (Massachusetts Institute of Technology) · Xiaohan Zhang (State University of New York at Binghamton) · Sriram Yenamandra (Georgia Institute of Technology) · Mikael Henaff (Facebook) · Alexander Sax (University of California Berkeley) · Sneha Silwal (AI at Meta) · Paul McVay (Meta) · Oleksandr Maksymets (Facebook) · Sergio Arnaud (None) · Pranav Putta (Georgia Institute of Technology) · Karmesh Yadav (Meta AI) · Qiyang Li (University of California Berkeley) · Benjamin Newman (Meta Platforms) · Mohit Sharma (Carnegie Mellon University) · Vincent-Pierre Berges (Meta) · Shiqi Zhang (State University of New York at Binghamton) · Pulkit Agrawal (Massachusetts Institute of Technology) · Dhruv Batra (FAIR (Meta) and Georgia Tech) · Yonatan Bisk (Carnegie Mellon University) · Mrinal Kalakrishnan (Meta) · Franziska Meier (Facebook) · Chris Paxton (meta) · Aravind Rajeswaran (Facebook AI Research)
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Zehuan Huang () · Hao Wen (Beijing University of Aeronautics and Astronautics) · Junting Dong (None) · Yaohui Wang (Shanghai AI Laboratory) · Yangguang Li (Shanghai AI Laboratory) · Xinyuan Chen (Shanghai Artificial Intelligence Laboratory) · Yan-Pei Cao (Tencent ARC Lab) · Ding Liang (Tsinghua University, Tsinghua University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Bo Dai (Shanghai AI Laboratory) · Lu Sheng (Beihang University)
BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection
Wenjie Wang (Zhejiang University) · Yehao Lu (Zhejiang University) · Guangcong Zheng (None) · Shuigenzhan (Zhejiang University) · Xiaoqing Ye (Baidu Inc.) · Zichang Tan (Baidu) · Jingdong Wang (Baidu) · Gaoang Wang (Zhejiang University) · Xi Li (Zhejiang University)
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
Wenqi Jia (None) · Miao Liu (META AI) · Hao Jiang (Facebook) · Ishwarya Ananthabhotla (Meta Reality Labs Research) · James Rehg (None) · Vamsi Krishna Ithapu (Facebook Reality Labs) · Ruohan Gao (Meta Reality Labs)
Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline
Xiao Wang (Anhui University) · Shiao Wang (None) · Chuanming Tang (University of Chinese Academy of Sciences) · Zhu Lin (None) · Bo Jiang (Anhui University) · Yonghong Tian (Peking University) · Jin Tang (Anhui University)
Memory-based Adapters for Online 3D Scene Perception
Xiuwei Xu (Tsinghua University, Tsinghua University) · Chong Xia (Tsinghua University) · Ziwei Wang (Tsinghua University, Tsinghua University) · Linqing Zhao (Tsinghua University) · Yueqi Duan (None) · Jie Zhou (None) · Jiwen Lu (Tsinghua University)
MultiDiff: Consistent Novel View Synthesis from a Single Image
Norman Müller (Meta) · Katja Schwarz (None) · Barbara Roessle (Technische Universität München) · Lorenzo Porzi (Facebook) · Samuel Rota Bulò (Meta) · Matthias Nießner (Technical University of Munich) · Peter Kontschieder (Meta)
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
Linshan Wu (HKUST) · Jia-Xin Zhuang (Department of Computer Science and Engineering, Hong Kong University of Science and Technology) · Hao Chen (The Hong Kong University of Science and Technology)
PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation
Ruining Deng (Vanderbilt University) · Quan Liu (Vanderbilt University) · Can Cui (Vanderbilt University) · Tianyuan Yao (Vanderbilt University) · Jialin Yue (Vanderbilt University) · Juming Xiong (Vanderbilt University) · Lining yu (Vanderbilt University) · Yifei Wu (Vanderbilt University) · Mengmeng Yin (Vanderbilt University) · Yu Wang (Vanderbilt University Medical Center) · Shilin Zhao (Vanderbilt University) · Yucheng Tang (NVIDIA) · Haichun Yang (Vanderbilt Unversity Medical School) · Yuankai Huo (Vanderbilt University)
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang (Peking University) · Haiyang Xu (Alibaba Group) · Mengfan Dong (Peking University) · Jiaxing Chen (Peking University) · Wei Ye (Peking University) · Ming Yan (Alibaba Group) · Qinghao Ye (Alibaba Group) · Ji Zhang (Alibaba Group) · Fei Huang (Alibaba Group) · Shikun Zhang (Peking University)
Self-Supervised Facial Representation Learning with Facial Region Awareness
Zheng Gao (Queen Mary, University of London) · Ioannis Patras (Queen Mary University of London)
Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Shijie Zhou (University of California, Los Angeles) · Haoran Chang (University of California, Los Angeles) · Sicheng Jiang (University of California, Los Angeles) · Zhiwen Fan (University of Texas, Austin) · Zehao Zhu (University of Texas at Austin) · Dejia Xu (University of Texas at Austin) · Pradyumna Chari (University of California, Los Angeles) · Suya You (University of Southern California) · Zhangyang Wang (University of Texas at Austin) · Achuta Kadambi (UCLA)
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
Moreno D'Incà (University of Trento) · Elia Peruzzo (University of Trento) · Massimiliano Mancini (University of Trento) · Dejia Xu (University of Texas at Austin) · Vidit Goel (Georgia Tech | UIUC / Oregon | PAIR) · Xingqian Xu (University of Illinois, Urbana Champaign) · Zhangyang Wang (University of Texas at Austin) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR) · Nicu Sebe (University of Trento)
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
Vidit Goel (Georgia Tech | UIUC / Oregon | PAIR) · Elia Peruzzo (University of Trento) · Yifan Jiang (University of Texas at Austin) · Dejia Xu (University of Texas at Austin) · Xingqian Xu (University of Illinois, Urbana Champaign) · Nicu Sebe (University of Trento) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Zhangyang Wang (University of Texas at Austin) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou (Zhejiang University) · Jiazheng Xing (Zhejiang University) · Yijie Qian (Zhejiang University) · Yaowei Guo (Zhejiang University) · Shuo Xin (Zhejiang University of Technology) · Junhao Chen (Zhejiang University) · Kai Tang (Zhejiang University) · Mengmeng Wang (Zhejiang University of Technology) · Zhengkai Jiang (Tencent) · Liang Liu (Tencent Youtu Lab) · Yong Liu (Zhejiang University)
SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream
Zhu Lin (None) · Kangmin Jia (Beijing Institute of Technology) · Yifan Zhao (Beihang University) · Yunshan Qi (BeiHang University) · Lizhi Wang (None) · Hua Huang (Beijing Normal University)
Edit One for All: Interactive Batch Image Editing
Thao Nguyen (UW-Madison 🦡) · Utkarsh Ojha (University of Wisconsin - Madison) · Yuheng Li (University of Wisconsin - Madison) · Haotian Liu (University of Wisconsin-Madison) · Yong Jae Lee (Department of Computer Sciences, University of Wisconsin - Madison)
H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration
MORTEZA GHAHREMANI (Technische Universität München) · Mohammad Khateri (University of Eastern Finland) · Bailiang Jian (Technische Universität München) · Benedikt Wiestler (Technical University Munich) · Ehsan Adeli (Stanford University) · Christian Wachinger (Technische Universität München)
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
Peng Lu (SIGS, Tsinghua University) · Tao Jiang (Shanghai AI Laboratory) · Yining Li (Shanghai AI Laboratory) · Xiangtai Li (Nanyang Technological University) · Kai Chen (Shanghai AI Laboratory) · Wenming Yang (Tsinghua University,)
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models
Nastaran Saadati (Iowa State University) · Minh Pham (New York University) · Nasla Saleem (Iowa State University) · Joshua R. Waite (Iowa State University) · Aditya Balu (Iowa State University) · Zhanhong Jiang (Iowa State University) · Chinmay Hegde (New York University) · Soumik Sarkar (Iowa State University)
Resolution Limit of Single-Photon LIDAR
Stanley H. Chan (Purdue University, USA) · Hashan K Weerasooriya (Purdue University) · Weijian Zhang (Purdue University) · Pamela Abshire (University of Maryland, College Park) · Istvan Gyongy (University of Edinburgh, University of Edinburgh) · Robert Henderson (University of Edinburgh)
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
Yutao Feng (Zhejiang University) · Yintong Shang (University of Utah) · Xuan Li (None) · Tianjia Shao (Zhejiang University) · Chenfanfu Jiang (University of California, Los Angeles) · Yin Yang (University of Utah)
Neural Visibility Field for Uncertainty-Driven Active Mapping
Shangjie Xue (Georgia Institute of Technology) · Jesse Dill (Georgia Institute of Technology) · Pranay Mathur (Georgia Institute of Technology) · Frank Dellaert (Georgia Tech) · Panagiotis Tsiotras (Georgia Institute of Technology) · Danfei Xu (Georgia Institute of Technology)
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Jiangbo Shi (Xi'an Jiaotong University) · Chen Li (Xi'an Jiaotong University) · Tieliang Gong (Xi'an Jiaotong University) · Yefeng Zheng (None) · Huazhu Fu (Institute of High Performance Computing, Singapore, A*STAR)
MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision
Chenyangguang Zhang (Tsinghua University) · Guanlong Jiao (Tsinghua University, Tsinghua University) · Yan Di (Technische Universität München) · Gu Wang (Tsinghua University) · Ziqin Huang (Tsinghua University, Tsinghua University) · Ruida Zhang (Department of Automation, Tsinghua University, Tsinghua University) · Fabian Manhardt (Google) · Bowen Fu (Technische Universität München) · Federico Tombari (Google, TUM) · Xiangyang Ji (Tsinghua University)
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
Phuc Nguyen (VinAI Research) · Tuan Duc Ngo (UMass Amherst) · Evangelos Kalogerakis (UMass Amherst) · Chuang Gan (MIT-IBM Watson AI Lab) · Anh Tran (VinAI Research) · Cuong Pham (Posts & Telecommunications Institute of Technology and VinAI Research) · Khoi Nguyen (VinAI Research)
FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo (Harvard Ophthalmology AI Lab) · MIN SHI (Harvard University) · Muhammad Osama Khan (New York University) · Muhammad Muneeb Afzal (New York University) · Hao Huang (New York University) · Shuaihang Yuan (New York University) · Yu Tian (None) · Luo Song (Mass Eye and Ear) · Ava Kouhana (Harvard Ophthalmology AI lab) · Tobias Elze (Harvard University) · Yi Fang (New York University) · Mengyu Wang (Harvard University)
Class Incremental Learning with Multi-Teacher Distillation
Haitao Wen (University of Electronic Science and Technology of China) · Lili Pan (University of Electronic Science and Technology of China) · Yu Dai (University of Electronic Science and Technology of China) · Heqian Qiu (University of Electronic Science and Technology of China) · Lanxiao Wang (University of Electronic Science and Technology of China) · Qingbo Wu (University of Electronic Science and Technology of China) · Hongliang Li (University of Electronic Science and Technology of China, Tsinghua University)
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe (None) · Satya Narayan Shukla (Meta AI) · Omid Poursaeed (Meta AI) · Michael Ryoo (Stony Brook University) · Tsung-Yu Lin (Department of Computer Science, University of Massachusetts, Amherst)
Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views
Ziwei Zhao (Indiana University) · Yuchen Wang (Indiana University) · Chuhua Wang (Indiana University, Bloomington)
CaDeT: a Causal Disentanglement Approach for Robust Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz (None) · Junrui Zhang (University of Toronto) · Amir Rasouli (Huawei Technologies Canada)
AlignMiF: Geometry-Aligned Multimodal Implicit Field for Enhanced LiDAR-Camera Joint Synthesis
Tao Tang (SYSU) · Guangrun Wang (University of Oxford) · Yixing Lao (None) · Peng Chen (Alibaba Group) · Jie Liu (North China University of Technology) · Liang Lin (SUN YAT-SEN UNIVERSITY, Tsinghua University) · Kaicheng Yu (Alibaba Group) · Xiaodan Liang (Sun Yat-sen University)
GOAT-Bench: A Benchmark for Multi-modal Lifelong Navigation
Mukul Khanna (Georgia Institute of Technology) · Ram Ramrakhya (None) · Gunjan Chhablani (Georgia Institute of Technology) · Sriram Yenamandra (Georgia Institute of Technology) · Theo Gervet (Carnegie Mellon University) · Matthew Chang (University of Illinois, Urbana Champaign) · Zsolt Kira (Georgia Institute of Technology) · Devendra Singh Chaplot (Carnegie Mellon University) · Dhruv Batra (FAIR (Meta) and Georgia Tech) · Roozbeh Mottaghi (Meta)
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem (Brown University) · Conor McCullough (Google) · Randy Hsin (Google) · Chas Leichner (Google) · Shan Li (Google) · In Suk Chong (Google) · Andrew Howard (Google) · Lukasz Lew (Research, Google) · Sherief Reda (Brown University) · Ville-Mikko Rautio (Google) · Daniele Moro (Google Research)
BodyMAP - Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed
Abhishek Tandon (Carnegie Mellon University) · Anujraaj Goyal (Carnegie Mellon University) · Henry M. Clever (NVIDIA) · Zackory Erickson (Carnegie Mellon University)
Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Weiwei Cao (University of Science and Technology of China) · Jianpeng Zhang (None) · Yingda Xia (Alibaba Group) · Tony C. W. MOK (Alibaba DAMO Academy) · Zi Li (Alibaba DAMO Academy) · Xianghua Ye (Zhejiang University) · Le Lu (Alibaba Group) · Jian Zheng (Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences) · Yuxing Tang (Alibaba Group) · Ling Zhang (Alibaba Group)
Boosting Neural Representations for Videos with a Conditional Decoder
XINJIE ZHANG (The Hong Kong University of Science and Technology) · Ren Yang (None) · Dailan He (The Chinese University of Hong Kong) · Xingtong Ge (Beijing Institute of Technology) · Tongda Xu (Tsinghua University) · Yan Wang (Tsinghua University, Tsinghua University) · Hongwei Qin (SenseTime Co.) · Jun Zhang (The Hong Kong University of Science and Technology)
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Junyi Zhang (Shanghai Jiao Tong University) · Charles Herrmann (Google) · Junhwa Hur (Google) · Eric Chen (University of Illinois Urbana-Champaign) · Varun Jampani (Google Research) · Deqing Sun (Google) · Ming-Hsuan Yang (University of California at Merced)
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Minghan Li (The Hong Kong Polytechnic University ) · Shuai Li (The Hong Kong Polytechnic University) · Xindong Zhang (The Hong Kong Polytechnic University, Hong Kong Polytechnic University) · Lei Zhang (The Hong Kong Polytechnic University)
Text-Guided 3D Face Synthesis - From Generation to Editing
Yunjie Wu (NetEase, Inc.) · Yapeng Meng (Tsinghua University, Tsinghua University) · Zhipeng Hu (Leihuo Game, NetEase) · Lincheng Li () · Haoqian Wu (NetEase Fuxi AI Lab) · Kun Zhou (Zhejiang University) · Weiwei Xu (Zhejiang University) · Xin Yu (University of Queensland)
AssistGUI: Task-Oriented PC Graphical User Interface Automation
Difei Gao (None) · Lei Ji (Research, Microsoft) · Zechen Bai (Show Lab, National University of Singapore) · Mingyu Ouyang (National University of Singaore) · Peiran Li (national university of singaore, National University of Singapore) · Dongxing Mao (SUTD) · Qin WU (National University of Singapore) · Weichen Zhang (National University of Singapore) · Peiyi Wang (national university of singaore, National University of Singapore) · Xiangwu Guo (South China University of Technology) · Hengxu Wang (national university of singaore, National University of Singapore) · Luowei Zhou (Google) · Mike Zheng Shou (National University of Singapore)
Don’t drop your samples! Coherence-aware training benefits Conditional diffusion
Nicolas Dufour (Ecole Nationale des Ponts et Chausees) · Victor Besnier (Valeo.ai) · Vicky Kalogeiton (Ecole polytechnique, IP Paris) · David Picard (None)
3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
Weijia Li (Sun Yat-sen University) · Haote Yang (PJLab) · Zhenghao Hu (SUN YAT-SEN UNIVERSITY) · Juepeng Zheng (Sun Yat-Sen University) · Gui-Song Xia (Wuhan University) · Conghui He (None)
Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization
Takuhiro Kaneko (NTT Corporation)
Learning the 3D Fauna of the Web
Zizhang Li (Zhejiang University) · Dor Litvak (University of Texas at Austin) · Ruining Li (University of Oxford) · Yunzhi Zhang (Stanford University) · Tomas Jakab (University of Oxford) · Christian Rupprecht (University of Oxford) · Shangzhe Wu (Stanford University) · Andrea Vedaldi (University of Oxford) · Jiajun Wu (Stanford University)
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang (Princeton University) · Bhishma Dedhia (Princeton University) · Niraj Jha (Princeton University)
Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
Oindrila Saha (University of Massachusetts at Amherst) · Grant Horn (University of Massachusetts at Amherst) · Subhransu Maji (University of Massachusetts, Amherst)
Modular Blind Video Quality Assessment
Wen Wen (City University of Hong Kong) · Mu Li (The Chinese University of Hong Kong, Shenzhen) · Yabin ZHANG (Bytedance) · Yiting Liao (Bytedance) · Junlin Li (ByteDance Inc.) · Li zhang (Bytedance Inc.) · Kede Ma (City University of Hong Kong)
Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions
Yujeong Chae (KAIST) · Hyeonseong Kim (KAIST) · Kuk-Jin Yoon (KAIST)
SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
Jan-Niklas Dihlmann (Eberhard-Karls-Universität Tübingen) · Andreas Engelhardt (University of Tübingen) · Hendrik Lensch (University of Tübingen)
Synergistic Global-space Camera and Human Reconstruction from Videos
Yizhou Zhao (Carnegie Mellon University) · Tuanfeng Y. Wang (None) · Bhiksha Raj (Carnegie Mellon University) · Min Xu (Carnegie Mellon University) · Jimei Yang (Adobe Research) · Chun-Hao P. Huang (Adobe Systems)
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
Honghao Chen (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xiangxiang Chu (MeiTuan) · Renyongjian (University of the Chinese Academy of Sciences) · Xin Zhao (University of Science and Technology Beijing) · Kaiqi Huang (, Institute of automation, Chinese academy of science)
Towards Generalizing to Unseen Domains with Few Labels
Chamuditha Jayanga Galappaththige (Queensland University of Technology) · Sanoojan Baliah (Mohamed bin Zayed University of Artificial Intelligence) · Malitha Gunawardhana (University of Auckland) · Muhammad Haris Khan (None)
Towards Detailed and Robust 3D Clothed Human Reconstruction with High-Frequency and Low-Frequency Information of Parametric Body Models
Yifan Yang (South China University of Technology) · Dong Liu (South China University of Technology) · Shuhai Zhang (South China University of Technology) · Zeshuai Deng (SCUT) · Zixiong Huang (South China University of Technology) · Mingkui Tan (South China University of Technology)
Snapshot Lidar: Fourier embedding of amplitude and phase for single-image depth reconstruction
Sarah Friday (Dartmouth College) · Yunzi Shi (Dartmouth College) · Yaswanth Kumar Cherivirala (Univ. of Michigan/NVIDIA) · Vishwanath Saragadam (University of California, Riverside) · Adithya Pediredla (Dartmouth College)
Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line Segments
Yusuke Takimoto (Huawei Technologies Japan K.K.) · Hikari Takehara (Huawei Technologies Japan K.K.) · Hiroyuki Sato (Huawei Technologies Japan K.K.) · Zihao Zhu (Keio University) · Bo Zheng (Huawei Technologies Japan)
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
Ziyao Huang (, Chinese Academy of Sciences) · Fan Tang (Institute of Computing Technology, CAS) · Yong Zhang (Tencent AI Lab) · Xiaodong Cun (Tencent AI Lab) · Juan Cao (Institute of Computing Technology, Chinese Academy of Sciences) · Jintao Li (Institute of Computing Technology, Chinese Academy of Sciences) · Tong-yee Lee (National Cheng Kung University)
Learning Degradation Independent Representations for Camera ISP Pipelines
Yanhui Guo (McMaster University) · Fangzhou Luo (McMaster University) · Xiaolin Wu (McMaster University)
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Jihao Liu (The Chinese University of Hong Kong) · Jinliang Zheng (Tsinghua University) · Yu Liu (The Chinese University of Hong Kong) · Hongsheng Li (The Chinese University of Hong Kong)
MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
He Zhang (Beihang University) · Shenghao Ren (Nanjing University) · Haolei Yuan (Beijing University of Aeronautics and Astronautics) · Jianhui Zhao (Beijing University of Aeronautics and Astronautics) · Fan Li (Beijing University of Aeronautics and Astronautics) · Shuangpeng Sun (Tsinghua University, Tsinghua University) · Zhenghao Liang (Tsinghua University, Tsinghua University) · Tao Yu (Tsinghua University, Tsinghua University) · Qiu Shen (Nanjing University) · Xun Cao (Nanjing University)
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Ioanna Ntinou (Queen Mary University of London) · Enrique Sanchez (Samsung AI Center Cambridge) · Georgios Tzimiropoulos (Queen Mary University London)
Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes
YuJie Lu (Donghua University, Shanghai) · Long Wan (Donghua University, Shanghai) · Nayu Ding (Donghua University, Shanghai) · Yulong Wang (Donghua University, Shanghai) · Shuhan Shen (Institute of automation, Chinese academy of science) · Shen Cai (Donghua University) · Lin Gao (University of Chinese Academy of Sciences)
Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion
Sofia Casarin (Free University of Bozen-Bolzano) · Cynthia Ugwu (Free University of Bozen) · Sergio Escalera (Computer Vision Center) · Oswald Lanz (Free University of Bozen-Bolzano)
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Yannan He (University of Tübingen) · Garvita Tiwari (University of Tuebingen and MPI-Saarbrucken) · Tolga Birdal (Imperial College London) · Jan Lenssen (Saarland Informatics Campus, Max-Planck Institute) · Gerard Pons-Moll (University of Tübingen)
MMA-Diffusion: MultiModal Attack on Diffusion Models
Yijun Yang (The Chinese University of Hong Kong) · Ruiyuan Gao (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Xiaosen Wang (Huazhong University of Science and Technology) · Tsung-Yi Ho (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Xu Nan (Institute of Automation, Chinese Academy of Sciences) · Qiang Xu (The Chinese University of Hong Kong)
Control4D: Efficient 4D Portrait Editing with Text
Ruizhi Shao (Tsinghua University, Tsinghua University) · Jingxiang Sun (None) · Cheng Peng (Tsinghua University, Tsinghua University) · Zerong Zheng (Tsinghua University) · Boyao ZHOU (Tsinghua University) · Hongwen Zhang (Beijing Normal University) · Yebin Liu (Tsinghua University)
ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion
Jiayu Yang (Australian National University) · Ziang Cheng (Australian National University) · Yunfei Duan (Tencent Game) · Pan Ji (Tencent XR Vision Labs) · Hongdong Li (Australian National University)
Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks
Shin'ya Yamaguchi (Kyoto University) · Sekitoshi Kanai (NTT) · Kazuki Adachi (NTT) · Daiki Chijiwa (NTT, The University of Tokyo)
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang (University of Amsterdam) · Hazel Doughty (Leiden University) · Cees G. M. Snoek (University of Amsterdam)
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Zhan Li (OPPO US Research Center & Portland State University) · Zhang Chen (OPPO US Research Center) · Zhong Li (InnoPeak Technology) · Yi Xu (OPPO US Research Center)
LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning
Siyuan Cheng (Purdue University) · Guanhong Tao (Purdue University) · Yingqi Liu (Microsoft) · Guangyu Shen (Purdue University) · Shengwei An (Purdue University) · Shiwei Feng (Purdue University, West Lafayette) · Xiangzhe Xu (Purdue University) · Kaiyuan Zhang (Computer Science, Purdue University) · Shiqing Ma (University of Massachusetts at Amherst) · Xiangyu Zhang (, Purdue University)
Long-Tailed Anomaly Detection with Learnable Class Names
Chih-Hui Ho (University of California San Diego) · Kuan-Chuan Peng (Mitsubishi Electric Research Laboratories (MERL)) · Nuno Vasconcelos (University of California San Diego)
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
Haoxuanye Ji (Xi'an Jiaotong University) · Pengpeng Liang (Zhengzhou University) · Erkang Cheng (Nullmax)
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Shiming Chen (Carnegie Mellon University) · Wenjin Hou (Huazhong University of Science and Technology) · Salman Khan (Mohamed bin Zayed University of Artificial Intelligence) · Fahad Shahbaz Khan (MBZUAI; Linköping University)
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation guided by the Characteristic Dance Primitives
Ronghui Li (Tsinghua University) · Yuxiang Zhang (Tsinghua University, Tsinghua University) · Yachao Zhang (Tsinghua University) · Hongwen Zhang (Beijing Normal University) · Jie Guo (Peng Cheng Laboratory) · Yan Zhang (ETH Zurich) · Yebin Liu (Tsinghua University) · Xiu Li (Tsinghua University)
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
Zining Chen (Beijing University of Posts and Telecommunications) · Weiqiu Wang (Beijing University of Posts and Telecommunications) · Zhicheng Zhao (Beijing University of Posts and Telecommunications) · Fei Su (Beijing University of Posts and Telecommunications) · Aidong Men (Beijing University of Posts and Telecommunications) · Hongying Meng (None)
Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning
Rui Zhao (Xi'an Jiaotong University) · Bin Shi (Xi'an Jiaotong University) · Jianfei Ruan (Xi'an Jiaotong University) · Tianze Pan (Xi'an Jiaotong University) · Bo Dong (Xi'an Jiaotong University)
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment
Muhammad Sohail Danish (Mohamed bin Zayed University of Artificial Intelligence) · Muhammad Haris Khan (None) · Muhammad Akhtar Munir (None) · M. Sarfraz (Karlsruher Institut für Technologie) · Mohsen Ali (Information Technology University)
Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising
Haijin Zeng (IMEC & Universiteit Gent) · Jiezhang Cao (ETH Zürich) · Yongyong Chen (Harbin Institute of Technology (Shenzhen)) · Kai Zhang (None) · Hiep Luong (Universiteit Gent - IMEC) · Wilfried Philips (Universiteit Gent)
ManiFPT: Defining and Analyzing Fingerprints of Generative Models
Hae Jin Song (University of Southern California) · Mahyar Khayatkhoei (USC/ISI) · Wael AbdAlmageed (Clemson University)
Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation
Jihyun Kim (Yonsei University, LG Electronics) · Changjae Oh (Queen Mary University London) · Hoseok Do (LG Electronics) · Soohyun Kim (Korea University) · Kwanghoon Sohn (Yonsei University)
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
Trung Dao (VinAI) · Duc H Vu (VinAI Research) · Cuong Pham (Posts & Telecommunications Institute of Technology and VinAI Research) · Anh Tran (VinAI Research)
MaxQ: Multi-Axis Query for N:M Sparsity Network
Jingyang Xiang (Zhejiang University) · Siqi Li (Zhejiang University) · Junhao Chen (Zhejiang University) · Zhuangzhi Chen (Zhejiang University of Technology) · Tianxin Huang (Tencent youtu lab) · Linpeng Peng (Zhejiang University) · Yong Liu (Zhejiang University)
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
Mingyu Lee (Chung-Ang University, LGCNS) · Jongwon Choi (Chung-Ang University)
Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
Pengze Zhang (Sun Yat-sen University) · Hubery Yin (Tencent) · Chen Li (Tencent) · Xiaohua Xie (SUN YAT-SEN UNIVERSITY)
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Myeongseob Ko (Virginia Polytechnic Institute and State University) · Feiyang Kang (Virginia Polytechnic Institute and State University) · Weiyan Shi (Stanford University) · Ming Jin (Virginia Tech) · Zhou Yu (Columbia University) · Ruoxi Jia (Virginia Tech)
Deep Equilibrium Diffusion Restoration with Parallel Sampling
Jiezhang Cao (ETH Zürich) · Yue Shi (Shanghai Jiao Tong University) · Kai Zhang (None) · Yulun Zhang (Shanghai Jiao Tong University) · Radu Timofte (University of Würzburg) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.)
One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning
Pei-Kai Huang (Department of Computer Science, National Tsing Hua University) · Cheng-Hsuan Chiang (National Tsinghua University) · Tzu-Hsien Chen (National Tsinghua University) · Jun-Xiong Chong (National Tsing Hua University) · Tyng-Luh Liu (IIS/Academia Sinica) · Chiou-Ting Hsu (National Tsing Hua University)
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
Qingping SUN (City University of Hong Kong) · Yanjun Wang (Shanghai Jiao Tong University) · Ailing Zeng (IDEA) · Wanqi Yin (SenseTime Research ) · Chen Wei (SenseTime International PTE. LTD.) · Wenjia Wang (University of Hong Kong) · Haiy Mei (None) · Chi LEUNG (City University of Hong Kong) · Ziwei Liu (Nanyang Technological University) · Lei Yang (The Chinese University of Hong Kong) · Zhongang Cai (Nanyang Technological University)
Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples
Junhao Dong (Nanyang Technological University) · Piotr Koniusz (Data61/CSIRO + Australian National University) · Junxi Chen (SUN YAT-SEN UNIVERSITY) · Z. Wang (University of British Columbia) · Yew-Soon Ong (Nanyang Technological University)
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong (State University of New York at Stony Brook) · Bin Chen (Bytedance Inc.) · Xiulong Liu (University of Washington) · Pawel Polak (State University of New York at Stony Brook) · Peng Zhang (Bytedance)
Validating Privacy-Preserving Face Recognition under a Minimum Assumption
Hui Zhang (Anhui University) · Xingbo Dong (Anhui University) · YenLungLai (Anhui University) · Ying Zhou (Anhui University) · Xiaoyan ZHANG (Anhui University) · Xingguo Lv (Anhui University) · Zhe Jin (Anhui University) · Xuejun Li (Anhui University)
Named Entity Driven Zero-Shot Image Manipulation
Zhida Feng (Wuhan University of Science and Technology) · Li Chen (Wuhan University of Science and Technology) · Jing Tian (National University of Singapore) · Jiaxiang Liu (Baidu) · Shikun Feng (Baidu)
Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling
Jiawei Shi (Northwest Polytechnical University Xi'an) · Hui Deng (Northwest Polytechnical University Xi'an) · Yuchao Dai (Northwestern Polytechnical University)
Multiway Point Cloud Mosaicking with Diffusion and Global Optimization
Shengze Jin (Department of Computer Science, ETHZ - ETH Zurich) · Iro Armeni (Stanford University) · Marc Pollefeys (ETH Zurich / Microsoft) · Daniel Barath (ETHZ - ETH Zurich)
IDGuard: Robust, General, Identity-centric POI Proactive Defense Against Face Editing Abuse
Yunshu Dai (SUN YAT-SEN UNIVERSITY) · Jianwei Fei (Nanjing University of Information Science and Technology) · Fangjun Huang (SUN YAT-SEN UNIVERSITY)
Single-Model and Any-Modality for Video Object Tracking
Zongwei Wu (Bayerische Julius-Maximilians-Universität Würzburg) · Jilai Zheng (Shanghai Jiaotong University) · Xiangxuan Ren (Shanghai Jiao Tong University) · Florin-Alexandru Vasluianu (Bayerische Julius-Maximilians-Universität Würzburg) · Chao Ma (Shanghai Jiao Tong University) · Danda Paudel (INSAIT, Sofia University) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Radu Timofte (University of Würzburg)
iKUN: Speak to Trackers without Retraining
Yunhao Du (Beijing University of Posts and Telecommunications) · Cheng Lei (Beijing University of Posts and Telecommunications) · Zhicheng Zhao (Beijing University of Posts and Telecommunications) · Fei Su (Beijing University of Posts and Telecommunications)
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi (The Hong Kong University of Science and Technology) · Jiahao Pan (Hong Kong University of Science and Technology) · Peng Li (Tsinghua University) · Ruibin Yuan (Hong Kong University of Science and Technology) · Xiaowei Chi (Hong Kong University of Science and Technology) · Mengfei Li (Hong Kong University of Science and Technology) · Wenhan Luo (SUN YAT-SEN UNIVERSITY) · Wei Xue (Hong Kong University of Science and Technology) · Shanghang Zhang (Peking University) · Qifeng Liu (The Hong Kong University of Science and Technology) · Yike Guo (Imperial College London)
Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
Wenhao Tang (Chongqing University) · Fengtao ZHOU (Department of Computer Science and Engineering, Hong Kong University of Science and Technology) · Sheng Huang (Chongqing University) · Xiang Zhu (Chongqing University) · Yi Zhang (Chongqing University) · Bo Liu (Rutgers University)
SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction
Conghao Wong (Huazhong University of Science and Technology) · Beihao Xia (Huazhong University of Science and Technology) · Ziqian Zou (Huazhong University of Science and Technology) · Yulong Wang (Huazhong Agricultural University) · Xinge You (Huazhong University of Science and Technology)
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei (Nanjing University of Science and Technology) · Tao Chen (None) · Xiruo Jiang (None) · 刘华峰 Liu (Nanjing University of Science and Technology) · Zeren Sun (Nanjing University of Science and Technology) · Yazhou Yao (Nanjing University of Science and Technology)
Pose-Transformed Equivariant Network for 3D Point Trajectory Prediction
Ruixuan Yu (Shandong University) · Jian Sun (Xi'an Jiaotong University)
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim (Kyung Hee University) · Hyeon Bae Kim (Kyung Hee University) · Jinyoung Moon (ETRI) · Jinwoo Choi (Kyung Hee University) · Seong Tae Kim (Kyung Hee University)
Object Dynamics Modeling with Hierarchical Point Cloud-based Representations
Chanho Kim (Oregon State University) · Li Fuxin (Oregon State University)
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Zirui Wang (Princeton University) · Zhizhou Sha (Tsinghua University, Tsinghua University) · Zheng Ding (University of California, San Diego) · Yilin Wang (Tsinghua University, Tsinghua University) · Zhuowen Tu (University of California, San Diego)
Efficient Hyperparameter Optimization with Adaptive Fidelity Identification
Jiantong Jiang (The University of Western Australia) · Zeyi Wen (Hong Kong University of Science and Technology (Guangzhou)) · Atif Mansoor (University of Western Australia) · Ajmal Mian (University of Western Australia)
MESA: Matching Everything by Segmenting Anything
Yesheng Zhang (Shanghai Jiaotong University) · Xu Zhao (Shanghai Jiao Tong University)
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Evonne Ng (University of California, Berkeley) · Javier Romero (None) · Timur Bagautdinov (Reality Labs Research) · Shaojie Bai (Meta) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Angjoo Kanazawa (UC Berkeley) · Alexander Richard (Reality Labs Research, Meta)
SIRA: Scalable Inter-frame Relation and Association for Radar Perception
Ryoma Yataka (Mitsubishi Electric Research Laboratories (MERL)) · Pu (Perry) Wang (None) · Petros Boufounos (Mitsubishi Electric Research Laboratories) · Ryuhei Takahashi (Mitsubishi Electric Corporation)
The More You See in 2D, the More You Perceive in 3D
Xinyang Han (UC Berkeley) · Zelin Gao () · Angjoo Kanazawa (UC Berkeley) · Shubham Goel (Avataar) · Yossi Gandelsman (University of California, Berkeley)
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Yuwen Xiong (University of Toronto) · Zhiqi Li (Nanjing University) · Yuntao Chen (CAIR, HKISI, CAS) · Feng Wang (Tsinghua University, Tsinghua University) · Xizhou Zhu (Shanghai AI Laboratory) · Jiapeng Luo (SenseTime Research) · Wenhai Wang (Shanghai AI Laboratory) · Tong Lu (Nanjing University) · Hongsheng Li (The Chinese University of Hong Kong) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Lewei Lu (SenseTime) · Jie Zhou (None) · Jifeng Dai (Tsinghua University, Tsinghua University)
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen (Samsung Research America) · Nina Shvetsova (None) · Andrew Rouditchenko (Massachusetts Institute of Technology) · Daniel Kondermann (Quality Match GmbH) · Samuel Thomas (IBM Research) · Shih-Fu Chang (Columbia University) · Rogerio Feris (International Business Machines) · James Glass (Massachusetts Institute of Technology) · Hilde Kuehne (University of Bonn MIT-IBM Watson AI Lab)
Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features
Youngmin Chung (Sung Kyun Kwan University) · Ji Hun Ha (Sung Kyun Kwan University) · Kyeong Chan Im (Sungkyunkwan University) · Joo Sang Lee (Sungkyunkwan University)
2S-UDF: A Novel Two-stage UDF Learning Method for Robust Non-watertight Model Reconstruction from Multi-view Images
Junkai Deng (Institute of Software, Chinese Academy of Sciences) · Fei Hou (Institute of Software, Chinese Academy of Sciences) · Xuhui Chen (Institute of Software, Chinese Academy of Sciences) · Wencheng Wang (Institute of Software, Chinese Academy of Sciences) · Ying He (Nanyang Technological University)
Continuous Optical Zooming: A Benchmark for Arbitrary-Scale Image Super-Resolution in Real World
Huiyuan Fu (Beijing University of Posts and Telecommunications) · Fei Peng (Beijing University of Posts and Telecommunications) · Xianwei Li (Beijing University of Posts and Telecommunications) · Yejun Li (Beijing University of Posts and Telecommunications) · Xin Wang (State University of New York at Stony Brook) · Huadong Ma (Beijing University of Post and Telecommunication, Tsinghua University)
ViewDiff: 3D-Consistent Image Generation with Text-To-Image Models
Lukas Hoellein (None) · Aljaž Božič (Facebook) · Norman Müller (Meta) · David Novotny (Facebook) · Hung-Yu Tseng (Meta) · Christian Richardt (Meta Reality Labs) · Michael Zollhoefer (Meta) · Matthias Nießner (Technical University of Munich)
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Tianyu Yu (Tsinghua University, Tsinghua University) · Yuan Yao (Tsinghua University) · Haoye Zhang (Tsinghua University, Tsinghua University) · Taiwen He (Tsinghua University, Tsinghua University) · Yifeng Han (Zhejiang University) · Ganqu Cui (Tsinghua University, Tsinghua University) · Jinyi Hu (Tsinghua University, Tsinghua University) · Zhiyuan Liu (Tsinghua University) · Hai-Tao Zheng (Tsinghua University, Tsinghua University) · Maosong Sun (Tsinghua University, Tsinghua University)
General Point Model Pretraining with Autoencoding and Autoregressive
Zhe Li (华中科技大学) · Zhangyang Gao (Westlake University, China) · Cheng Tan (Zhejiang University & Westlake University) · Bocheng Ren (None) · Laurence Yang (Hainan University) · Stan Z. Li (Westlake University)
Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes
Zhiyuan Yu (Na) · Zheng Qin (National University of Defense Technology) · lintao zheng (National University of Defense Technology) · Kai Xu (National University of Defense Technology)
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Jack Urbanek (Facebook) · Florian Bordes (Meta AI) · Pietro Astolfi (Meta AI) · Mary Williamson (Meta AI (FAIR)) · Vasu Sharma (Meta AI/ CMU) · Adriana Romero-Soriano (Meta)
Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval
Fan Zhang (Georgia Institute of Technology) · Xian-Sheng Hua (Terminus Group) · Chong Chen (Terminus Group) · Xiao Luo (University of California, Los Angeles)
GaussianAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh
Jing Wen (University of Illinois Urbana-Champaign) · Xiaoming Zhao (UIUC) · Jason Ren (Apple) · Alexander G. Schwing (University of Illinois Urbana-Champaign) · Shenlong Wang (University of Illinois, Urbana Champaign)
Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior
Zhenyu Chen (Nanjing University) · Jie Guo (Nanjing University) · Shuichang Lai (Nanjing University) · Ruoyu Fu (nanjing university) · mengxun kong (None) · Chen Wang (Nanjing University) · Hongyu Sun (Guangdong Oppo Mobile Telecommunications Corp., Ltd) · Zhebin Zhang (OPPO) · Chen Li (Innopeak Technology Inc.) · Yanwen Guo (Nanjing University)
Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
ChangHee Yang (LG Electornic) · ChanHee Kang (Sogang University) · Kyeongbo Kong (Pusan National University) · Hanni Oh (Sogang University) · Suk-Ju Kang (Sogang University)
Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities
Mingcheng Li (Fudan University) · Dingkang Yang (Fudan University) · Xiao Zhao (None) · Shuaibing Wang (Fudan University) · Yan Wang (Fudan University) · Kun Yang (Fudan University) · Mingyang Sun (Fudan University) · Dongliang Kou (Academy for Engineering and Technology, Fudan University, Shanghai, China.) · Qian (Fudan University) · Lihua Zhang (Fudan University)
ES$^3$: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
Yuanhang Zhang (Institute of Computing Technology, Chinese Academy of Sciences) · Shuang Yang (Institute of Computing Technology, Chinese Academy of Sciences) · Shiguang Shan (Institute of Computing Technology, Chinese Academy of Sciences) · Xilin Chen (None)
MSU-4S - The Michigan State University Four Seasons Dataset
Daniel Kent (Michigan State University) · Mohammed Alyaqoub (Michigan State University) · Xiaohu Lu (Michigan State University) · Sayed Khatounabadi (Michigan State University) · Kookjin Sung (Michigan State University) · Cole Scheller (Michigan State University) · Alexander Dalat (University of Michigan - Ann Arbor) · Xinwei Guo (Michigan State University) · Asma Bin Thabit (Michigan State University) · Roberto Muntaner Whitley (Michigan State University) · Hayder Radha (Michigan State University)
Estimating Extreme 3D Image Rotations using Cascaded Attention
Shay Dekel (Bar Ilan University) · Yosi Keller (Bar Ilan University) · Martin Čadík (Brno University of Technology)
KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation
Ruida Zhang (Department of Automation, Tsinghua University, Tsinghua University) · Chenyangguang Zhang (Tsinghua University) · Yan Di (Technische Universität München) · Fabian Manhardt (Google) · Xingyu Liu (Tsinghua University, Tsinghua University) · Federico Tombari (Google, TUM) · Xiangyang Ji (Tsinghua University)
Taming Stable Diffusion for Text to 360$^{\circ}$ Panorama Image Generation
Cheng Zhang (None) · Qianyi Wu (Monash University) · Camilo Cruz Gambardella (Monash University) · Xiaoshui Huang (Shanghai AI Laboratory) · Dinh Phung (Monash University) · Wanli Ouyang (University of Sydney) · Jianfei Cai (Monash University)
Towards Progressive Multi-Frequency Representation for Image Warping
Jun Xiao (The Hong Kong Polytechnic University) · Zihang Lyu (The Hong Kong Polytechnic University) · Cong Zhang (Hong Kong Polytechnic University) · Yakun Ju (Nanyang Technological University) · Changjian Shui (Vector Institute) · Kin-man Lam (The Hong Kong Polytechnic University)
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
Jaskirat Singh (Australian National University) · Jianming Zhang (Adobe Systems) · Qing Liu (Adobe Systems) · Cameron Smith (Adobe Systems) · Zhe Lin (Adobe Research) · Liang Zheng (Australian National University)
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Kraus (Recursion) · Kian Kenyon-Dean (Recursion Pharma) · Saber Saberian (Recursion Pharma) · Maryam Fallah (Recursion Pharmaceuticals) · Peter McLean (Recursion) · Jess Leung (Recursion) · Vasudev Sharma (Recursion) · Ayla Khan (University of Utah) · Jia Balakrishnan (Recursion Pharmaceuticals) · Safiye Celik (Recursion) · Dominique Beaini (Valence Labs) · Maciej Sypetkowski (Valence Labs) · Chi Cheng (Boston University, Boston University) · Kristen Morse (Recursion) · Maureen Makes (University of Utah) · Ben Mabey (None) · Berton Earnshaw (University of Utah)
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Fanghua Yu (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Jinjin Gu (University of Sydney) · Zheyuan Li (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Jinfan Hu (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Xiangtao Kong (Hong Kong Polytechnic University) · Xintao Wang (Tencent) · Jingwen He (Shanghai ai lab) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Chao Dong (SIAT)
A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation
Qucheng Peng (University of Central Florida) · Ce Zheng (University of Central Florida) · Chen Chen ()
VOODOO 3D: VOlumetric pOrtrait Disentanglement fOr Online 3D head reenactment
Phong Tran (MBZUAI) · Egor Zakharov (ETH Zurich) · Long Nhat Ho (Mohamed bin Zayed University of Artificial Intelligence) · Anh Tran (VinAI Research) · Liwen Hu (Pinscreen) · Hao Li (Mohamed bin Zayed University of Artificial Intelligence)
Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning
Christopher Liao (None) · Theodoros Tsiligkaridis (MIT Lincoln Laboratory, Massachusetts Institute of Technology) · Brian Kulis (Boston University)
Masked and Shuffled Blind Spot Denoising for Real-World Images
Hamadi Chihaoui (University of Bern) · Paolo Favaro (Institute für Informatik, University of Bern)
Open-vocabulary object 6D pose estimation
Jaime Corsetti (Fondazione Bruno Kessler & University of Trento) · Davide Boscaini (Fondazione Bruno Kessler) · Changjae Oh (Queen Mary University London) · Andrea Cavallaro (EPFL - EPF Lausanne) · Fabio Poiesi (Fondazione Bruno Kessler)
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
Ziyue Feng (Clemson University) · Huangying Zhan (OPPO US Research Center) · Zheng Chen (Indiana University, Bloomington) · Qingan Yan (OPPO US Research Center) · Xiangyu Xu (None) · Changjiang Cai (None) · Bing Li (Clemson University) · Qilun Zhu (Clemson University) · Yi Xu (OPPO US Research Center)
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
Linqi Zhou (Stanford University) · Andy Shih (Stanford University) · Chenlin Meng (None) · Stefano Ermon (Stanford University)
ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing
Jun-Kun Chen (None) · Samuel Rota Bulò (Meta) · Norman Müller (Meta) · Lorenzo Porzi (Facebook) · Peter Kontschieder (Meta) · Yu-Xiong Wang (None)
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
Yuhao Liu (City University of Hong Kong) · Zhanghan Ke (City University of Hong Kong) · Fang Liu (City University of Hong Kong) · Nanxuan Zhao (Adobe Research) · Rynson W.H. Lau (City University of Hong Kong)
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu (None) · Guandao Yang (None) · Zhibing Li (The Chinese University of Hong Kong) · Kai Zhang (Adobe Systems) · Ziwei Liu (Nanyang Technological University) · Leonidas Guibas (Stanford University) · Dahua Lin (The Chinese University of Hong Kong) · Gordon Wetzstein (Stanford University)
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
Zixiong Huang (South China University of Technology) · Qi Chen (The University of Adelaide) · Libo Sun (University of Adelaide) · Yifan Yang (South China University of Technology) · Naizhou Wang (CVTE research) · Qi Wu (University of Adelaide) · Mingkui Tan (South China University of Technology)
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang (University of Sydney) · Gil Avraham (Amazon) · Yan Zuo (Amazon) · Sameera Ramasinghe (Amazon) · Loris Bazzani (Amazon) · Anton van den Hengel (University of Adelaide)
MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
Ziyang Chen (Guizhou University) · Wei Long (Guizhou University) · He Yao (Guizhou University) · Yongjun Zhang (Guizhou University) · Bingshu Wang (Northwest Polytechnical University Xi'an) · Yongbin Qin (Guizhou University) · Jia Wu (Monash University)
Discovering Syntactic Interaction Clues for Human-Object Interaction Detection
Jinguo Luo () · Weihong Ren (Harbin Institute of Technology, Shenzhen) · Weibo Jiang (Harbin Institute of Technology) · Xi'ai Chen (Shenyang Institute of Automation, Chinese Academy of Sciences) · Qiang Wang (Shenyang University) · Zhi Han (Shenyang Institute of Automation, Chinese Academy of Sciences) · Honghai LIU (Harbin Institute of Technology, Shenzhen)
Active Prompt Learning in Vision Language Models
Jihwan Bang (KAIST) · Sumyeong Ahn (Michigan State University) · Jae-Gil Lee (Korea Advanced Institute of Science and Technology)
FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions
Zhen Liu (Nanjing University) · Hao Zhu (Nanjing University) · Qi Zhang (Tencent AI Lab) · Jingde Fu (Nanjing University) · Weibing Deng (nanjing university) · Zhan Ma (Nanjing University) · Yanwen Guo (Nanjing University) · Xun Cao (Nanjing University)
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie Yang (The Chinese University of Hong Kong, Shenzhen) · Bingliang Li (The Chinese University of Hong Kong (Shenzhen)) · Ailing Zeng (IDEA) · Lei Zhang (International Digital Economy Academy (IDEA)) · Ruimao Zhang (The Chinese University of Hong Kong (Shenzhen))
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Petru-Daniel Tudosiu (Huawei) · Yongxin Yang (Queen Mary University of London) · Shifeng Zhang (Huawei Technologies Ltd.) · Fei Chen (Huawei Noah's Ark Lab) · Steven McDonagh (University of Edinburgh) · Gerasimos Lampouras (Huawei Technologies Ltd.) · Ignacio Iacobacci (Huawei Noah's Ark Lab) · Sarah Parisot (Huawei)
Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture
Juanwu Lu (Purdue University) · Can Cui (Purdue University) · Yunsheng Ma (Purdue University) · Aniket Bera (Purdue University) · Ziran Wang (Purdue University)
VidToMe: Video Token Merging for Zero-Shot Video Editing
Xirui Li (Shanghai Jiaotong University) · Chao Ma (Shanghai Jiao Tong University) · Xiaokang Yang (Shanghai Jiao Tong University, China) · Ming-Hsuan Yang (University of California at Merced)
Text-image Alignment for Diffusion-based Perception
Neehar Kondapaneni (California Institute of Technology) · Markus Marks (California Institute of Technology) · Manuel Knott (ETHZ - ETH Zurich) · Rogério Guimarães (California Institute of Technology) · Pietro Perona (California Institute of Technology)
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Jimyeong Kim (Seoul National University) · Jungwon Park (Seoul National University) · Wonjong Rhee (Seoul National University)
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing (The Hong Kong University of Science and Technology) · Yingqing He (HKUST) · Zeyue Tian (Hong Kong University of Science and Technology) · Xintao Wang (Tencent) · Qifeng Chen (Hong Kong University of Science and Technology)
Robust Synthetic-to-Real Transfer for Stereo Matching
Jiawei Zhang (Beijing University of Aeronautics and Astronautics) · Jiahe Li (Beijing University of Aeronautics and Astronautics) · Lei Huang (Beihang University) · Xiaohan Yu (Macquarie University) · Lin Gu (RIKEN / the University of Tokyo) · Jin Zheng (Beijing University of Aeronautics and Astronautics) · Xiao Bai (Beijing University of Aeronautics and Astronautics)
Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline
Yu chen (Beijing Waiyan Online Digital Technology Co., Ltd) · Fei Gao (Hangzhou Institute of Technology, Xidian University) · YanguangZhang (Hangzhou Dianzi University) · Maoying Qiao (University of Technology Sydney) · Nannan Wang (Xidian University)
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
Yaxu Xie (German Research Center for Artificial Intelligence) · Alain Pagani (German Research Center for Artificial Intelligence (DFKI)) · Didier Stricker (Universität Kaiserslautern)
When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach
TAO MA (Peking University) · Bing Bai (Independent Researcher) · Haozhe Lin (None) · Heyuan Wang (Peking University) · Yu Wang (Qiyuan Lab) · Lin Luo (Peking University) · Lu Fang (Tsinghua University, Tsinghua University)
Mitigating Motion Blur in Neural Radiance Fields with Events and Frames
Marco Cannici (Robotics and Perception Group, Department of Informatics, University of Zurich) · Davide Scaramuzza (University of Zurich)
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun (Shanghai Jiao Tong University) · Ye Fang (None) · Tong Wu (None) · Pan Zhang (Shanghai Artificial Intelligence Laboratory) · Yuhang Zang (Nanyang Technological University) · Shu Kong (University of Macau, Texas A&M University) · Yuanjun Xiong (Mthreads) · Dahua Lin (The Chinese University of Hong Kong) · Jiaqi Wang (Shanghai AI Laboratory)
Discriminability-Driven Channel Selection for Out-of-Distribution Detection
Yue Yuan (Shandong University) · Rundong He (Shandong University) · Yicong Dong (Shandong University) · Zhongyi Han (Shandong University) · Yilong Yin (Shandong University)
DemoFusion: Democratising High-Resolution Image Generation With No $$$
Ruoyi DU (Beijing University of Posts and Telecommunications) · Dongliang Chang (Tsinghua University) · Timothy Hospedales (None) · Yi-Zhe Song (None) · Zhanyu Ma (Beijing University of Post and Telecommunication)
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Seokju Yun (University of Seoul) · Youngmin Ro (University of Seoul)
Makeup Prior Models for 3D Facial Makeup Estimation and Applications
Xingchao Yang (Cyberagent) · Takafumi Taketomi (CyberAgent) · Yuki Endo (University of Tsukuba) · Yoshihiro Kanamori (University of Tsukuba)
Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras
Ashwath Shetty (Saarland Informatics Campus, Max-Planck Institute) · Marc Habermann (Saarland Informatics Campus, Max-Planck Institute) · Guoxing Sun (Max Planck Institute for Informatics) · Diogo Luvizon (Saarland Informatics Campus, Max-Planck Institute) · Vladislav Golyanik (MPI for Informatics) · Christian Theobalt (MPI Informatik)
Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
Youtian Lin (Nanjing university) · Zuozhuo Dai (Alibaba Group) · Siyu Zhu (Fudan University) · Yao Yao (Nanjing University)
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Yusheng Dai (University of Science and Technology of China) · HangChen (University of Science and Technology of China) · Jun Du (University of Science and Technology of China) · Ruoyu Wang (University of Science and Technology of China) · shihao chen (University of Science and Technology of China) · Haotian Wang (University of Science and Technology of China) · Chin-Hui Lee (Georgia Institute of Technology)
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Changhoon Kim (Arizona State University) · Kyle Min (Intel Labs) · Maitreya Patel (Arizona State University) · Sheng Cheng (Arizona State University) · 'YZ' Yezhou Yang (Arizona State University)
MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
Jakub Micorek (Technische Universität Graz) · Horst Possegger (Graz University of Technology) · Dominik Narnhofer (Technische Universität Graz) · Horst Bischof (Graz University of Technology) · Mateusz Kozinski (Technische Universität Graz)
MuRF: Multi-Baseline Radiance Fields
Haofei Xu (ETH Zurich) · Anpei Chen (Department of Computer Science, ETHZ - ETH Zurich) · Yuedong Chen (Monash University) · Christos Sakaridis (ETH Zurich) · Yulun Zhang (Shanghai Jiao Tong University) · Marc Pollefeys (ETH Zurich / Microsoft) · Andreas Geiger (University of Tübingen) · Fisher Yu (ETH Zurich)
Resource-Efficient Transformer Pruning for Finetuning of Large Models
Fatih Ilhan (Georgia Institute of Technology) · Gong Su (IBM, International Business Machines) · Selim Tekin (College of Computing, Georgia Institute of Technology) · Tiansheng Huang (Georgia Institute of Technology) · Sihao Hu (Georgia Institute of Technology) · Ling Liu (Georgia Institute of Technology)
Referring Image Editing: Object-level Image Editing via Referring Expressions
Chang Liu (None) · Xiangtai Li (Nanyang Technological University) · Henghui Ding (Fudan University)
InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields
Dongqing Wang (EPFL) · Tong Zhang (EPFL) · Alaa Abboud (EPFL - EPF Lausanne) · Sabine Süsstrunk (None)
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang (Peking University) · Jiaming Liu (Peking University) · Chenxuan Li (Peking university) · Yuan Zhang (Peking University) · Ma Junpeng (Peking University) · Xinyu Wei (Peking University) · Kevin Zhang (Peking University) · Maurice Chong (Peking University) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Yijiang Liu (Nanjing University) · Shanghang Zhang (Peking University)
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
Xuying Zhang (Nankai University) · Bo-Wen Yin (Nankai University) · yuming chen (None) · Zheng Lin (Nankai University) · Yunheng Li (Nankai University) · Qibin Hou (Nankai University) · Ming-Ming Cheng (Nankai University, Tsinghua University)
Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model
Runmin Dong (Tsinghua University) · Shuai Yuan (The University of Hong Kong) · Bin Luo (Tsinghua University) · Mengxuan Chen (Tsinghua University) · Jinxiao Zhang (Tsinghua University) · Lixian Zhang (National Supercomputing Center in Shenzhen) · Weijia Li (Sun Yat-sen University) · Juepeng Zheng (Sun Yat-Sen University) · Haohuan Fu (Tsinghua University, Tsinghua University)
X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
Shuofeng Sun (Beijing University of Posts and Telecommunications) · Yongming Rao (Tsinghua University) · Jiwen Lu (Tsinghua University) · Haibin Yan (Beijing University of Posts and Telecommunications)
Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Marianna Ohanyan (Picsart) · Hayk Manukyan (Picsart AI Research) · Zhangyang Wang (University of Texas at Austin) · Shant Navasardyan (Picsart AI Research) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
HOIAnimator: Text-Prompt Human-Object Animations Generation with Perceptive Diffusion Models
Wenfeng Song (Beijing Information Science and Technology University) · Xinyu Zhang (Beijing Information Science and Technology University) · Shuai Li (Beijing University of Aeronautics and Astronautics) · Yang Gao (Beijing University of Aeronautics and Astronautics) · Aimin Hao (None) · Xia HOU (Beijing Information Science & Technology University) · Chenglizhao Chen (China University of Petroleum) · Ning Li (Beijing Information Science and Technology University) · Hong Qin (Stony Brook University (State University of New York at Stony Brook))
BiPer: Binary Neural Networks using a Periodic Function
Edwin Vargas (Universidad Industrial de Santander) · Claudia Correa (Universidad Industrial de Santander) · Carlos Hinojosa (KAUST) · Henry Arguello (Universidad Industrial de Santander)
Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching
Rui Gong (Nanyang Technological University) · Weide Liu (Harvard University) · ZAIWANG GU (None) · Xulei Yang (Institute for Infocomm Research (I2R), A*STAR) · Jun Cheng (Institute For Infocomm Research, A*STAR)
How Far Can We Compress Instant NGP-Based NeRF?
Yihang Chen (Shanghai Jiao Tong University) · Qianyi Wu (Monash University) · Mehrtash Harandi (Monash University) · Jianfei Cai (Monash University)
Putting the Object Back into Video Object Segmentation
Ho Kei Cheng (University of Illinois Urbana-Champaign) · Seoung Wug Oh (Adobe Systems) · Brian Price (Adobe Research) · Joon-Young Lee (Adobe Research) · Alexander G. Schwing (University of Illinois Urbana-Champaign)
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Hao Ouyang (Department of Computer Science and Engineering, Hong Kong University of Science and Technology) · Qiuyu Wang (Ant Group) · Yuxi Xiao (Zhejiang University) · Qingyan Bai (Hong Kong University of Science and Technology) · Juntao Zhang (Hong Kong University of Science and Technology) · Kecheng Zheng (Ant Group) · Xiaowei Zhou (None) · Qifeng Chen (Hong Kong University of Science and Technology) · Yujun Shen (The Chinese University of Hong Kong)
Streaming Dense Video Captioning
Xingyi Zhou (Google) · Anurag Arnab (Google) · Shyamal Buch (Google) · Shen Yan (Google Research) · Austin Myers (Google) · Xuehan Xiong (Google) · Arsha Nagrani (Google ) · Cordelia Schmid (Inria / Google)
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu (Peking University) · Chen Li (Tencent ARC Lab) · Yixiao Ge (Tencent) · Thomas H. Li (AIIT, Peking University) · Ying Shan (Tencent) · Ge Li (Peking University Shenzhen Graduate School)
Towards 3D Vision with Low-Cost Single-Photon Cameras
Fangzhou Mu (University of Wisconsin-Madison) · Carter Sifferman (University of Wisconsin - Madison) · Sacha Jungerman (University of Wisconsin - Madison) · Yiquan Li (University of Wisconsin - Madison) · Zhiyue Han (None) · Michael Gleicher (Department of Computer Sciences, University of Wisconsin - Madison) · Mohit Gupta (Department of Computer Sciences, University of Wisconsin - Madison) · Yin Li (University of Wisconsin, Madison)
Lane2Seq: Towards Unified Lane Detection via Sequence Generation
Kunyang Zhou (Southeast University)
Dual Prior Unfolding for Snapshot Compressive Imaging
Jiancheng Zhang (Northwestern Polytechnical University Xi'an) · Haijin Zeng (IMEC & Universiteit Gent) · Jiezhang Cao (ETH Zürich) · Yongyong Chen (Harbin Institute of Technology (Shenzhen)) · Dengxiu Yu (Northwest Polytechnical University) · Yinping Zhao (Northwestern Polytechnical University)
Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models
Kota Sueyoshi (Osaka University) · Takashi Matsubara (Hokkaido Universiry)
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers
Shahaf Arica (Technion - Israel Institute of Technology) · Or Rubin (Technion - Israel Institute of Technology) · Sapir Gershov (Technion - Israel Institute of Technology) · Shlomi Laufer (Technion)
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min (POSTECH) · Shyamal Buch (Google) · Arsha Nagrani (Google ) · Minsu Cho (POSTECH) · Cordelia Schmid (Inria / Google)
Color Shift Estimation-and-Correction for Image Enhancement
Yiyu Li (City University of Hong Kong) · Ke Xu (City University of Hong Kong) · Gerhard Hancke (City University of Hong Kong) · Rynson W.H. Lau (City University of Hong Kong)
Dexterous Grasp Transformer
Guo-Hao Xu (Sun Yat-sen University) · Yi-Lin Wei (SUN YAT-SEN UNIVERSITY) · Dian Zheng (None) · Xiao-Ming Wu (SUN YAT-SEN UNIVERSITY) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY)
Posterior Distillation Sampling
Juil Koo (KAIST) · Chanho Park (KAIST) · Minhyuk Sung (KAIST)
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
Inhwan Bae (GIST) · Junoh Lee (Gwangju Institute of Science and Technology) · Hae-Gon Jeon (GIST)
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
Juntao Zhang (Hong Kong University of Science and Technology) · Yuehuai LIU (Hong Kong University of Science and Technology) · Yu-Wing Tai (None) · Chi-Keung Tang (The Hong Kong University of Science and Technology)
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu (Peking University) · Ran Xu (Beijing University of Posts and Telecommunications) · Senqiao Yang (Harbin Institute of Technology) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Qizhe Zhang (Peking University) · Zehui Chen (University of Science and Technology of China) · Yandong Guo (OPPO Research Institute) · Shanghang Zhang (Peking University)
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Sherry X Chen (University of California, Santa Barbara) · Yaron Vaxman (cloudinary) · Elad Ben Baruch (Cloudinary) · David Asulin (Cloudinary Ltd.) · Aviad Moreshet (Cloudinary) · Kuo-Chin Lien (Layer AI) · Misha Sra (University of California, Santa Barbara) · Pradeep Sen (UC Santa Barbara)
HOI-M$^3$: Capture Multiple Humans and Objects Interaction within Contextual Environment
Juze Zhang (ShanghaiTech University) · Jingyan Zhang (ShanghaiTech University) · Zining Song (ShanghaiTech University) · Zhanhe Shi (ShanghaiTech University) · Chengfeng Zhao (ShanghaiTech University) · Ye Shi (ShanghaiTech University) · Jingyi Yu (Shanghai Tech University) · Lan Xu (ShanghaiTech University) · Jingya Wang (ShanghaiTech University)
PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation
Jinfeng Xu (Huazhong University of Science and Technology) · Siyuan Yang (HUST) · Xianzhi Li (Huazhong University of Science and Technology) · Yuan Tang (Huazhong University of Science and Technology) · yixue Hao (Huazhong University of Science and Technology) · Long Hu (Huazhong University of Science and Technology) · Min Chen (South China University of Technology)
Higher-order Relational Reasoning for Pedestrian Trajectory Prediction
Sungjune Kim (Korea University) · Hyung-gun Chi (Purdue University) · Hyerin Lim (Hyundai Motor Company) · Karthik Ramani (Purdue University) · Jinkyu Kim (Korea University) · Sangpil Kim (Korea University)
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
Zeeshan Hayder (CSIRO) · Xuming He (ShanghaiTech University)
Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
Zeliang Zhang (University of Rochester) · Mingqian Feng (University of Rochester) · Zhiheng Li (Amazon AGI) · Chenliang Xu (University of Rochester)
Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection
Jongha Kim (Korea University) · Jihwan Park (Korea University) · Jinyoung Park (Korea University) · Jinyoung Kim (Korea University) · Sehyung Kim (Korea University) · Hyunwoo J. Kim (Korea University)
MICap: A Unified Model for Identity-aware Movie Descriptions
Haran Raajesh (International Institute of Information Technology, Hyderabad, International Institute of Information Technology Hyderabad) · Naveen Reddy Desanur (International Institute of Information Technology Hyderabad) · Zeeshan Khan (INRIA) · Makarand Tapaswi (IIIT Hyderabad, Wadhwani AI)
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
Xingyi Li (Huazhong University of Science and Technology) · Zhiguo Cao () · Yizheng Wu (Nanyang Technological University) · Kewei Wang (Huazhong University of Science and Technology) · Ke Xian (Nanyang Technological University) · Zhe Wang (Sensetime Group Limited) · Guosheng Lin (Nanyang Technological University)
Referring Expression Counting
Siyang Dai (Singapore University of Technology and Design) · Jun Liu (Singapore University of Technology and Design (SUTD)) · Ngai-Man Cheung (Singapore University of Technology and Design)
SeD: Semantic-Aware Discriminator for Image Super-Resolution
Bingchen Li (University of Science and Technology of China) · Xin Li (None) · Hanxin Zhu (University of Science and Technology of China) · YEYING JIN (National University of Singapore) · Ruoyu Feng (University of Science and Technology of China) · Zhizheng Zhang (Microsoft Research) · Zhibo Chen (University of Science and Technology of China)
Robust Emotion Recognition in Context Debiasing
Dingkang Yang (Fudan University) · Kun Yang (Fudan University) · Mingcheng Li (Fudan University) · Shunli Wang (Fudan University) · Shuaibing Wang (Fudan University) · Lihua Zhang (Fudan University)
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
Romain Loiseau (IMAGINE - LIGM - ENPC, LASTIG - IGN) · Elliot Vincent (Imagine (LIGM) - Willow (Inria)) · Mathieu Aubry (ENPC) · Loic Landrieu (ENPC, IGN)
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Tianrui Lou (None) · Xiaojun Jia (, Chinese Academy of Sciences) · Jindong Gu (University of Oxford & Google Research) · Li Liu (University of Oulu) · Siyuan Liang (National University of Singapore) · Bangyan He (Institute of Information Engineering, CAS) · Xiaochun Cao (SUN YAT-SEN UNIVERSITY)
TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video
Minye Wu (KU Leuven) · Zehao Wang (KU Leuven) · Georgios Kouros (Department of Electrical Engineering, KU Leuven, Belgium, KU Leuven) · Tinne Tuytelaars (KU Leuven)
On Scaling up a Multilingual Vision and Language Model
Xi Chen (Google) · Josip Djolonga (Google) · Piotr Padlewski (Google) · Basil Mustafa (Google) · Soravit Changpinyo (Google Research) · Jialin Wu (Google) · Carlos Riquelme Ruiz (Google) · Sebastian Goodman (Google) · Xiao Wang (Google DeepMind) · Yi Tay (Google) · Siamak Shakeri (Research, Google) · Mostafa Dehghani (Google DeepMind) · Daniel Salz (Google) · Mario Lučić (Google) · Michael Tschannen (Google DeepMind) · Arsha Nagrani (Google ) · Hexiang Hu (Google Deepmind) · Mandar Joshi (Google DeepMind) · Bo Pang (Google) · Ceslee Montgomery (Google) · Paulina Pietrzyk (Google) · Marvin Ritter (Google DeepMind) · AJ Piergiovanni (Google) · Matthias Minderer (Google) · Filip Pavetic (Google) · Austin Waters (Google) · Gang Li (Google) · Ibrahim Alabdulmohsin (Google) · Lucas Beyer (Google Brain/DM Zürich) · Julien Amelot (Research, Google) · Kenton Lee (Google Research) · Andreas Steiner (Google) · Yang Li (Google) · Daniel Keysers (Google DeepMind) · Anurag Arnab (Google) · Yuanzhong Xu (Google) · Keran Rong (Google Deepmind) · Alexander Kolesnikov (Google) · Mojtaba Seyedhosseini (Google) · Anelia Angelova (Google) · Xiaohua Zhai (Google) · Neil Houlsby (Google) · Radu Soricut (Google)
Understanding Video Transfomers via Universal Concept Discovery
Matthew Kowal (York University) · Achal Dave (None) · Rares Andrei Ambrus (Toyota Research Institute) · Adrien Gaidon (Toyota Research Institute (TRI)) · Kosta Derpanis (York University/Samsung) · Pavel Tokmakov (Toyota Research Institute)
Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms
Joren Brunekreef (Netherlands Cancer Institute) · Eric Marcus (Netherlands Cancer Institute) · Ray Sheombarsing (None) · Jan-Jakob Sonke (Netherlands Cancer Institute) · Jonas Teuwen (Netherlands Cancer Institute)
Generative Proxemics: A Prior for 3D Social Interaction from Images
Lea Müller (University of California, Berkeley) · Vickie Ye (University of California, Berkeley) · Georgios Pavlakos (University of Texas at Austin) · Michael J. Black (University of Tübingen) · Angjoo Kanazawa (UC Berkeley)
3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images
Yifang Men (Alibaba Group) · Hanxi Liu (Peking University) · Yuan Yao (Alibaba group) · Miaomiao Cui (Alibaba Group) · Xuansong Xie (Alibaba Group) · Zhouhui Lian (Peking University)
Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding
Alessandro Achille (California Institute of Technology) · Greg Ver Steeg (University of California, Riverside) · Tian Yu Liu (University of California, Los Angeles) · Matthew Trager (Amazon) · Carson Klingenberg (Amazon Web Services) · Stefano Soatto (AWS)
DIOD: Self-Distillation Meets Object Discovery
Sandra Kara (CEA) · Hejer AMMAR (CEA) · Julien Denize (CEA) · Florian Chabot (CEA) · Quoc Cuong PHAM (CEA)
Amodal Completion via Progressive Mixed Context Diffusion
Katherine Xu (University of Pennsylvania) · Lingzhi Zhang (School of Engineering and Applied Science, University of Pennsylvania) · Jianbo Shi (None)
CAT-Seg: Cost Aggregation for Open-vocabulary Semantic Segmentation
Seokju Cho (Korea University) · Heeseong Shin (Korea University) · Sunghwan Hong (Korea University) · Anurag Arnab (Google) · Paul Hongsuck Seo (Google) · Seungryong Kim (Korea University)
FaceLift: Semi-supervised 3D Facial Landmark Localization
David Ferman (Flawless AI) · Pablo Garrido (Flawless AI) · Gaurav Bharaj (Reality Defender Inc)
Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori (University of Trento) · Alessandro Conti (University of Trento) · Paolo Rota (University of Trento) · Yiming Wang (Fondazione Bruno Kessler) · Elisa Ricci (University of Trento)
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
Xiaohan Lei () · Min Wang (Institute of Artificial Intelligence, Hefei Comprehensive National Science Center) · Wengang Zhou (University of Science and Technology of China) · Li Li (University of Science and Technology of China) · Houqiang Li (University of Science and Technology of China)
No More Ambiguity in 360$^\circ$ Room Layout via Bi-Layout Estimation
Yu-Ju Tsai (University of California, Merced) · Jin-Cheng Jhang (National Tsing Hua University) · JINGJING ZHENG (None) · Wei Wang (Amazon) · Albert Chen (Amazon) · Min Sun (Amazon/NTHU) · Cheng-Hao Kuo (Amazon) · Ming-Hsuan Yang (University of California at Merced)
PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis
Zhengyao Lv (University of Hong Kong) · Yuxiang Wei (The Hong Kong Polytechnic University, Hong Kong Polytechnic University) · Wangmeng Zuo (Harbin Institute of Technology) · Kwan-Yee K. Wong (The University of Hong Kong)
DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Harsh Rangwani (Indian Institute of Science) · Pradipto Mondal (Indian Institute of Technology, Kharagpur) · Mayank Mishra (CMU, Carnegie Mellon University) · Ashish Asokan (Indian Institute of Science, Indian institute of science, Bangalore) · R. Venkatesh Babu (Indian Institute of Science)
OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning
Geng Xinyu (None) · Jiaming Wang (Harbin Institute of Technology) · Jiawei Gong (Harbin Institute of Technology) · yuerong xue (Harbin Institute of Technology) · Jun Xu (Harbin Institute of Technology) · Fanglin Chen (Harbin Institute of Technology (Shenzhen)) · Xiaolin Huang (Shanghai Jiao Tong University, Tsinghua University)
MorpheuS: Neural Dynamic 360$^{\circ}$ Surface Reconstruction from Monocular RGB-D Video
Hengyi Wang (University College London) · Jingwen Wang (University College London) · Lourdes Agapito (University College London)
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Soumen Basu (Indian Institute of Technology Delhi) · Mayuna Gupta (Indian Institute of Technology, Delhi) · Chetan Madan (Indian Institute of Technology, Delhi) · Pankaj Gupta (PGIMER Chandigarh) · Chetan Arora (Indian Institute of Technology Delhi)
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang (UIUC) · Chejian Xu (University of Illinois at Urbana-Champaign) · Bo Li (UIUC)
DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields
Cheng-You Lu (University of Technology Sydney) · Peisen Zhou (Brown University) · Angela Xing (Brown University) · Chandradeep Pokhariya (International Institute of Information Technology, Hyderabad, International Institute of Information Technology Hyderabad) · Arnab Dey (I3S-CNRS/Université Côte D'Azur) · Ishaan Shah (International Institute of Information Technology, Hyderabad, International Institute of Information Technology Hyderabad) · Rugved Mavidipalli (Brown University) · Dylan Hu (Brown University) · Andrew Comport (CNRS) · Kefan Chen (Brown University) · Srinath Sridhar (None)
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Youngdong Jang (Korea University) · Dong In Lee (Korea University) · MinHyuk Jang (Korea University) · Jong Wook Kim (Korea University) · Feng Yang (Google Research) · Sangpil Kim (Korea University)
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu (University of Wisconsin-Madison) · SICHENG MO (University of California, Los Angeles) · Yin Li (University of Wisconsin, Madison)
PolarRec: Improving Radio Interferometric Data Reconstruction Using Polar Coordinates
Ruoqi Wang (The Hong Kong University of Science and Technology (Guangzhou)) · Zhuoyang Chen (The Hong Kong University of Science and Technology (Guangzhou)) · Jiayi Zhu (Hong Kong University of Science and Technology (Guangzhou)) · Qiong Luo (Hong Kong University of Science and Technology) · Feng Wang (Guangzhou University)
Learning to Predict Activity Progress by Self-Supervised Video Alignment
Gerard Donahue (Northeastern University) · Ehsan Elhamifar (None)
PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
Zhenyu Li (King Abdullah University of Science and Technology) · Shariq Bhat (King Abdullah University of Science and Technology (KAUST)) · Peter Wonka (KAUST)
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie (shanghai university) · Shengye Yu (Shanghai University) · Qile He (Shanghai University) · Mengtian Li (Shanghai University)
A Vision Check-up for Language Models
Pratyusha Sharma (Massachusetts Institute of Technology) · Tamar Rott Shaham (MIT) · Manel Baradad (Massachusetts Institute of Technology) · Stephanie Fu (University of California, Berkeley) · Adrian Rodriguez-Munoz (Massachusetts Institute of Technology) · Shivam Duggal (Massachusetts Institute of Technology) · Phillip Isola (None) · Antonio Torralba (MIT)
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu (Shanghai Jiaotong University) · Haoning Wu (Shanghai Jiao Tong University) · Yujie Zhong (Meituan Inc.) · Xiaoyun Zhang (Shanghai Jiao Tong University) · Yanfeng Wang (Shanghai Jiao Tong University) · Weidi Xie (Shanghai Jiaotong University)
NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis
Zinuo You (ETH Zurich) · Andreas Geiger (University of Tübingen) · Anpei Chen (Department of Computer Science, ETHZ - ETH Zurich)
Fourier-basis functions to bridge augmentation gap: Rethinking frequency augmentation in image classification
Puru Vaish (University of Twente) · Shunxin Wang (University of Twente) · Nicola Strisciuglio (University of Twente)
Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation
Ba Hung Ngo (Chonnam National University) · Nhat-Tuong Do-Tran (National Yang Ming Chiao Tung University) · Tuan-Ngoc Nguyen (FPT Telecom) · Hae-Gon Jeon (GIST) · Tae Jong Choi (Chonnam National University)
Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration
Yuxi Wei (None) · Zi Wang (CMU, Carnegie Mellon University) · Yifan Lu (Shanghai Jiaotong University) · Chenxin Xu (Shanghai Jiao Tong University & National University of Singapore) · Changxing Liu (Shanghai Jiaotong University) · Hao Zhao (Tsinghua University, Tsinghua University) · Siheng Chen (Shanghai Jiao Tong University) · Yanfeng Wang (Shanghai Jiao Tong University)
Region-Based Representations Revisited
Michal Shlapentokh-Rothman (University of Illinois at Urbana Champaign) · Ansel Blume (University of Illinois Urbana Champaign) · Yao Xiao (University of Illinois at Urbana-Champaign) · Yuqun Wu (Department of Computer Science) · Sethuraman T V (Department of Computer Science) · Heyi Tao (University of Illinois at Urbana-Champaign) · Jae Yong Lee (University of Illinois at Urbana-Champaign) · Wilfredo Torres-Calderon (Reconstruct) · Yu-Xiong Wang (None) · Derek Hoiem (University of Illinois at Urbana-Champaign)
FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures
Lisa Mais (Max Delbrück Center for Molecular Medicine) · Peter Hirsch (Max Delbrück Center for Molecular Medicine) · Claire Managan (HHMI Janelia Research Campus) · Ramya Kandarpa (Environmental Resources Management (ERM)) · Josef Rumberger (Max Delbrück Center for Molecular Medicine) · Annika Reinke (German Cancer Research Center) · Lena Maier-Hein (German Cancer Research Center (DKFZ)) · Gudrun Ihrke (HHMI Janelia Research Campus) · Dagmar Kainmueller (Universität Potsdam)
Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao (None) · Angela Yao (National University of Singapore) · Yicong Li (national university of singaore, National University of Singapore) · Tat-seng Chua (National University of Singapore)
SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
Ioannis Kakogeorgiou (National Technical University of Athens) · Spyros Gidaris (Valeo.ai) · Konstantinos Karantzalos (IMIS - "Athena" Research Center) · Nikos Komodakis (University of Crete)
ToNNO: Tomographic Reconstruction of a Neural Network’s Output for Weakly Supervised Segmentation of 3D Medical Images
Marius Schmidt-Mengin (None) · Alexis Benichoux (INRIA) · Shibeshih Belachew (Therapanacea) · Nikos Komodakis (University of Crete) · Nikos Paragios (Ecole Centrale de Paris)
Physics-aware Hand-object Interaction Denoising
Haowen Luo (Tsinghua University, Tsinghua University) · Yunze Liu (None) · Li Yi ()
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
Trong-Thuan Nguyen (University of Arkansas) · Pha Nguyen (University of Arkansas) · Khoa Luu (University of Arkansas)
Disentangled Pre-training for Human-Object Interaction Detection
Zhuolong Li (South China University of Technology) · Xingao Li (South China University of Technology) · Changxing Ding (South China University of Technology) · Xiangmin Xu (South China University of Technology)
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Hao Wu (University of Science and Technology of China) · Huabin Liu (Shanghai Jiao Tong University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Xiao Sun (Shanghai Artificial Intelligence Laboratory)
GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection
Xiaotian Li (Nanjing University of Posts and Telecommunications) · Baojie Fan (Nanjing University of Posts and Telecommunications) · Jiandong Tian (The Shenyang Institute of Automation, Chinese Academy of Sciences) · Huijie Fan (None)
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Haoran Lai (University of Science and Technology of China) · Qingsong Yao (University of the Chinese Academy of Sciences) · Zihang Jiang (University of Science and Technology of China) · Rongsheng Wang (University of Science and Technology of China) · Zhiyang He (Xunfei Healthcare Technology Co., Ltd.) · Xiaodong Tao (Xunfei Healthcare Co. Ltd) · S Kevin Zhou (University of Science and Technology of China)
$\textbf{LaRE}^2$: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
Yunpeng Luo (Tencent Youtu Lab) · Junlong Du (Tencent YouTu Lab) · Ke Yan () · Shouhong Ding (Tencent Youtu Lab)
Deep-TROJ: An Inference Stage Trojan Insertion Algorithm through Efficient Weight Replacement Attack
Sabbir Ahmed (State University of New York at Binghamton) · RANYANG ZHOU (New Jersey Institute of Technology) · Shaahin Angizi (New Jersey Institute of Technology) · Adnan Rakin Rakin (None)
ProMotion: Prototypes As Motion Learners
Yawen Lu (Purdue University) · Dongfang Liu (Rochester Institute of Technology) · Qifan Wang (Meta AI) · Cheng Han (Rochester Institute of Technology) · Yiming Cui (University of Florida) · Zhiwen Cao (Purdue University) · Xueling Zhang (Rochester Institute of Technology) · Yingjie Victor Chen (Purdue University) · Heng Fan (University of North Texas)
Zero-Shot Structure-Preserving Diffusion Model for High Dynamic Range Tone Mapping
Ruoxi Zhu (Fudan University) · Shusong Xu (Alibaba Group) · Peiye Liu (Alibaba Group) · Sicheng Li (Alibaba Group) · Yanheng Lu (Alibaba Group) · Dimin Niu (Alibaba Group) · Zihao Liu (Alibaba Group) · Zihao Meng (Alibaba Group) · Li Zhiyong (Alibaba Group) · Xinhua Chen (Fudan University) · Yibo Fan (Fudan University)
Mask Grounding for Referring Image Segmentation
Yong Xien Chng (None) · Henry Zheng (Tsinghua University) · Yizeng Han (Tsinghua University, Tsinghua University) · Xuchong QIU (Bosch) · Gao Huang (Tsinghua University, Tsinghua University)
SignGraph: A Sign Sequence is Worth Graphs of Nodes
Shiwei Gan (None) · Yafeng Yin (Nanjing University) · Zhiwei Jiang (Nanjing University) · Hongkai Wen (University of Warwick) · Lei Xie (Nanjing University) · Sanglu Lu (Nanjing University)
HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
Yi ZHOU (Samsung Research China-Beijing(SRCB)) · Hui Zhang (Samsung Rearch China-Beijing(SRCB)) · Jiaqian Yu (Samsung R&D Institute China - Beijing) · yifan yang (Samsung) · Sangil Jung (samsung) · Seung-In Park (Samsung Advanced Institute of Technology) · ByungIn Yoo (Samsung Advanced Institute of Technology)
$V_kD:$ Improving knowledge distillation using orthogonal projections
Roy Miles (Imperial College London) · Ismail Elezi (Huawei Noah's Ark) · Jiankang Deng (Imperial College London & Huawei UKRD)
DiLiGenRT: A Photometric Stereo Dataset with Quantified Roughness and Translucency
Heng Guo (Beijing University of Posts and Telecommunications) · Jieji Ren (Shanghai Jiao Tong University) · Feishi Wang (Peking University) · Boxin Shi (Peking University) · Mingjun Ren (Shanghai Jiaotong University) · Yasuyuki Matsushita (Osaka University)
Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation
Fahimeh Hosseini Noohdani (Sharif University of Technology) · Parsa Hosseini (Sharif University of Technology) · Aryan Yazdan Parast (Sharif University of Technology) · Hamidreza Araghi (Sharif University of Technology) · Mahdieh Baghshah (Sharif University of Technology)
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni (The Pennsylvania State University) · Bernhard Egger (Massachusetts Institute of Technology) · Suhas Lohit (Mitsubishi Electric Research Labs) · Anoop Cherian (Mitsubishi Electric Research Labs (MERL)) · Ye Wang (Mitsubishi Electric Research Labs) · Toshiaki Koike-Akino (Mitsubishi Electric Research Labs. (MERL)) · Sharon X. Huang (Pennsylvania State University) · Tim Marks (None)
NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis
Nilesh Kulkarni (None) · Davis Rempe (NVIDIA) · Kyle Genova (Google) · Abhijit Kundu (Google) · Justin Johnson (University of Michigan) · David Fouhey (New York University) · Leonidas Guibas (Stanford University)
Coherence As Texture -- Passive Textureless 3D Reconstruction by Self-interference
Wei-Yu Chen (Carnegie Mellon University) · Aswin C. Sankaranarayanan (Carnegie Mellon University) · Anat Levin (Weizmann Institute of Science) · Matthew O’Toole (Carnegie Mellon University)
Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
Feilong Tang (Monash University) · Zhongxing Xu (Weill Cornell Medicine, Cornell University) · Zhaojun QU (Xi'an Jiaotong-Liverpool University) · Wei Feng (Monash University) · xingjian jiang (University of Michigan - Ann Arbor) · Zongyuan Ge (Monash University)
Unsupervised Universal Image Segmentation
Xudong Wang (Electrical Engineering & Computer Science Department, University of California Berkeley) · Dantong Niu (University of California, Berkeley) · Xinyang Han (UC Berkeley) · Long Lian (University of California, Berkeley) · Roei Herzig (Tel Aviv University) · Trevor Darrell (Electrical Engineering & Computer Science Department)
Space-time Diffusion Features for Zero-shot Text-driven Motion Transfer
Rafail Fridman (Weizmann Institute of Science) · Danah Yatim (Weizmann Institute of Science) · Omer Bar-Tal (Weizmann Institute of Science) · Yoni Kasten (NVIDIA Research) · Tali Dekel (Weizmann Institute of Science)
GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
Shenhan Qian (Technical University of Munich (TUM)) · Tobias Kirschstein (Department of Informatics, Technische Universität München) · Liam Schoneveld (Woven by Toyota) · Davide Davoli (Toyota Motor Europe NV/SA associated partner by contracted services) · Simon Giebenhain (Technische Universität München) · Matthias Nießner (Technical University of Munich)
CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras
Sachin Shah (University of Maryland, College Park) · Matthew Chan (Department of Computer Science, University of Maryland, College Park) · Haoming Cai (University of Maryland, College Park) · Jingxi Chen (University of Maryland College Park) · Sakshum Kulshrestha (University of Maryland, College Park) · Chahat Deep Singh (University of Maryland, College Park) · Yiannis Aloimonos (University of Maryland, College Park) · Christopher Metzler (University of Maryland, College Park)
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Dave Chen (Technische Universität München) · Haoxuan Li (Technische Universität München) · Hsin-Ying Lee (Snap Inc.) · Sergey Tulyakov (Snap Inc.) · Matthias Nießner (Technical University of Munich)
CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing
Ajian Liu (NLPR, CASIA) · Shuai Xue (Beijing Institute of Technology) · Gan Jianwen (Macao University of Science and Techonology) · Jun Wan () · Yanyan Liang (Macau University of Science and Technology) · Jiankang Deng (Imperial College London & Huawei UKRD) · Sergio Escalera (Computer Vision Center) · Zhen Lei (Institute of Automation, Chinese Academy of Sciences)
Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration
Hong Chen (Huazhong University of Science and Technology) · Pei Yan (Huazhong University of Science and Technology) · sihe xiang (None) · Yihua Tan (Huazhong University of Science and Technology)
Retrieval-Augmented Open-Vocabulary Object Detection
Jooyeon Kim (Korea University) · Eulrang Cho (Samsung Research) · Sehyung Kim (Korea University) · Hyunwoo J. Kim (Korea University)
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
Xiaobao Wei (University of the Chinese Academy of Sciences) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Jiarui Wu (Beijing University of Aeronautics and Astronautics) · Jiaming Liu (Peking University) · Ming Lu (Intel Labs China) · Yandong Guo (OPPO Research Institute) · Shanghang Zhang (Peking University)
MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models
Yanting Wang (Pennsylvania State University) · Hongye Fu (Zhejiang University) · Wei Zou (Pennsylvania State University) · Jinyuan Jia (Pennsylvania State University)
Mudslide: A Universal Nuclear Instance Segmentation Method
Jun Wang (Peking University)
Long-Tail Class Incremental Learning via Independent Sub-prototype Construction
Xi Wang (Xidian University) · Xu Yang (Xi'an University of Electronic Science and Technology) · jie yin (None) · Kun Wei (Xidian University) · Cheng Deng (Xidian University)
Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World
Wen Yin (Huazhong University of Science and Technology) · Jian Lou (Zhejiang University) · Pan Zhou (Huazhong University of Science and Technology) · Yulai Xie (Huazhong University of Science and Technology) · Dan Feng (Huazhong University of Science and Technology) · Yuhua Sun (None) · Tailai Zhang (Huazhong University of Science and Technology) · Lichao Sun (Lehigh University)
G3DR: Generative 3D Reconstruction in ImageNet
Pradyumna Reddy () · Ismail Elezi (Huawei Noah's Ark) · Jiankang Deng (Imperial College London & Huawei UKRD)
Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception
Lei Fan (Northwestern University) · Mingfu Liang (Northwestern University) · Yunxuan Li (Northwestern University) · Gang Hua (Wormpex AI Research) · Ying Wu (Northwestern University)
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Lingyi Hong (Fudan University) · Shilin Yan (Fudan University) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Wanyun Li (Fudan University) · Xinyu Zhou (None) · Pinxue Guo (Fudan University) · Kaixun Jiang (Fudan University) · Yiting Cheng (None) · Jinglun Li (None) · Zhaoyu Chen (Fudan University) · Wenqiang Zhang (None)
Diffusion Time-step Curriculum for One Image to 3D Generation
YI Xuanyu (National Technological University) · Zike Wu (Nanyang Technological University) · Qingshan Xu (Nanyang Technological University) · Pan Zhou (Sea Group) · Joo Lim (I2R, A*STAR) · Hanwang Zhang (Nanyang Technological University)
Harnessing Meta-Learning for Improving Full-Frame Video Stabilization
Muhammad Kashif Ali (Hanyang University) · Eun Woo Im (Hanyang University) · Dongjin Kim (Hanyang University) · Tae Hyun Kim (Hanyang Univ.)
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects
Abhinav Kumar (Michigan State University) · Yuliang Guo (Bosch US Research) · Xinyu Huang (Robert Bosch Research NA) · Liu Ren (Bosch Research) · Xiaoming Liu (None)
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
Zan Wang (Beijing Institute of Technology) · Yixin Chen (BIGAI) · Baoxiong Jia (Beijing Institute for General Artificial Intelligence (BIGAI)) · Puhao Li (Department of Automation, Tsinghua University) · Jinlu Zhang (Peking University) · Jingze Zhang (Tsinghua University, Tsinghua University) · Tengyu Liu (None) · Yixin Zhu (Peking University) · Wei Liang (Beijing Institute of Technology) · Siyuan Huang (Beijing Institute of General Artificial Intelligence)
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Niccolò Cavagnero (Polytechnic Institute of Turin) · Gabriele Rosi (Polytechnic Institute of Turin - Focoos AI) · Claudia Cuttano (Polytechnic Institute of Turin) · Francesca Pistilli (Polytechnic Institute of Turin) · Marco Ciccone (Politecnico di Torino) · Giuseppe Averta (Polytechnic of Turin) · Fabio Cermelli (Politecnico di Torino)
Make Pixels Dance: High-Dynamic Video Generation
Yan Zeng (ByteDance) · Guoqiang Wei (ByteDance) · Jiani Zheng (None) · Jiaxin Zou (ByteDance Ltd.) · Yang Wei (East China Normal University) · Yuchen Zhang ( ByteDance Research) · Hang Li (ByteDance Technology)
Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation
Hyunwoo Ryu (None) · Jiwoo Kim (Yonsei University) · Hyunseok An (Yonsei University) · Junwoo Chang (Yonsei University) · Joohwan Seo (University of California, Berkeley) · Taehan Kim (Samsung) · Yubin Kim (Massachusetts Institute of Technology) · Chaewon Hwang (Ewha Women's University) · Jongeun Choi (Yonsei University) · Roberto Horowitz (University of California, Berkeley)
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
Wenqian Zhang (ShanghaiTech University) · Molin Huang (Shanghaitech University) · Yuxuan Zhou (None) · Juze Zhang (ShanghaiTech University) · Jingyi Yu (ShanghaiTech University) · Jingya Wang (ShanghaiTech University) · Lan Xu (ShanghaiTech University)
A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
Julio Silva-Rodríguez (ETS Montreal) · Sina Hajimiri (École de technologie supérieure, Université du Québec) · Ismail Ben Ayed (ETS Montreal) · Jose Dolz (École de technologie supérieure)
DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos
Arjun Balasingam (Massachusetts Institute of Technology) · Joseph Chandler (Massachusetts Institute of Technology) · Chenning Li (None) · Zhoutong Zhang (Adobe Systems) · Hari Balakrishnan (Massachusetts Institute of Technology)
Can Biases in ImageNet Models Explain Generalization?
Paul Gavrikov (Offenburg University) · Janis Keuper (Institute for Machine Learning and Analytics, Offenburg University)
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
Tianyu Huang (Harbin Institute of Technology & City University of Hong Kong) · Yihan Zeng (Huawei Technologies Ltd.) · Zhilu Zhang (Harbin Institute of Technology) · Wan Xu (Harbin Institute of Technology) · Hang Xu (Huawei Noah‘s Ark Lab) · Songcen Xu (Huawei Noah's Ark Lab) · Rynson W.H. Lau (City University of Hong Kong) · Wangmeng Zuo (Harbin Institute of Technology)
Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline
Anas Al-lahham (Mohamed bin Zayed University of Artificial Intelligence) · Muhammad Zaigham Zaheer (Mohamed bin Zayed University of Artificial Intelligence) · Nurbek Tastan (Mohamed bin Zayed University of Artificial Intelligence) · Karthik Nandakumar (Mohamed Bin Zayed University of Artificial Intelligence)
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
Boyang Peng (Tongji University) · Sanqing Qu (Tongji University) · Yong Wu (Tongji University) · Tianpei Zou (Tongji University) · Lianghua He (Tongji University) · Alois Knoll (Technical University Munich) · Guang Chen (Tongji University) · Changjun Jiang (Tongji University)
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo (National University of Singapore) · Guangzhi Wang (National University of Singapore) · Mohan Kankanhalli (National University of Singapore)
Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios
Shiyan Chen (Peking University) · Jiyuan Zhang (Peking University) · Zhaofei Yu (Peking University) · Tiejun Huang (Peking University)
Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Anthropic Prior Knowledge
Bo Zou (Computer Science, Tsinghua University, Tsinghua University) · Shaofeng Wang (Capital Medical Universty) · Hao Liu (, Tsinghua University) · Gaoyue Sun (Imperial College London) · Yajie Wang (Tsinghua University, Tsinghua University) · Zuo FeiFei (LargeV .Inc) · Chengbin Quan (Tsinghua University, Tsinghua University) · Youjian Zhao (Tsinghua University)
Multi-Level Neural Scene Graphs for Dynamic Urban Environments
Tobias Fischer (ETH Zurich) · Lorenzo Porzi (Facebook) · Samuel Rota Bulò (Meta) · Marc Pollefeys (ETH Zurich / Microsoft) · Peter Kontschieder (Meta)
Differentiable Display Photometric Stereo
Seokjun Choi (Pohang University of Science and Technology) · Seungwoo Yoon (POSTECH) · Giljoo Nam (Meta) · Seungyong Lee (POSTECH) · Seung-Hwan Baek (POSTECH)
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li (Nankai University & Tencent) · Mingdeng Cao (The University of Tokyo) · Xintao Wang (Tencent) · Zhongang Qi (Tencent PCG ARC Lab) · Ming-Ming Cheng (Nankai University, Tsinghua University) · Ying Shan (Tencent)
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
Zike Wu (Nanyang Technological University) · Pan Zhou (Sea Group) · YI Xuanyu (National Technological University) · Xiaoding Yuan (Johns Hopkins University) · Hanwang Zhang (Nanyang Technological University)
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Qilong Zhangli (Rutgers University) · Jindong Jiang (Rutgers University) · Di Liu (Rutgers University, New Brunswick) · Licheng Yu (None) · Xiaoliang Dai (Facebook) · Ankit Ramchandani (Meta Platforms, Inc.) · Guan Pang (Facebook) · Dimitris N. Metaxas (Rutgers) · Praveen Krishnan (Meta AI)
Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification
Jiancheng Zhang (Northwestern Polytechnical University Xi'an) · Haijin Zeng (IMEC & Universiteit Gent) · Yongyong Chen (Harbin Institute of Technology (Shenzhen)) · Dengxiu Yu (Northwest Polytechnical University) · Yinping Zhao (Northwestern Polytechnical University)
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Dinh Phat Do (Ajou University) · Taehoon Kim (Ajou University) · JAEMIN NA (Tech. Innovation Group, KT) · Jiwon Kim (Robotics Lab, Hyundai Motor Company) · Keonho LEE (Hyundai Motor Company) · Kyunghwan Cho (Hyundai Motor Company) · Wonjun Hwang (Ajou University)
MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying
Ryan Burgert (Stony Brook University) · Brian Price (Adobe Research) · Jason Kuen (Adobe Research) · Yijun Li (Adobe Research) · Michael Ryoo (Stony Brook University)
Self-Supervised Multi-Object Tracking with Path Consistency
Zijia Lu (Northeastern University) · Bing Shuai (Amazon Web Service) · Yanbei Chen (Amazon) · Zhenlin Xu (Amazon) · Davide Modolo (Amazon)
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
Chaokang Jiang () · Guangming Wang (University of Cambridge) · Jiuming Liu (Shanghai Jiao Tong University) · Hesheng Wang (Shanghai Jiao Tong University) · Zhuang Ma (PhiGent) · Zhenqiang Liu (None) · LIANG (None) · Yi Shan (PhiGent Robotics) · Dalong Du (PhiGent Robotics)
SPECAT: SPatial-spEctral Cumulative-Attention Transformer for High-Resolution Hyperspectral Image Reconstruction
Zhiyang Yao (Department of Electronic Engineering, Tsinghua University) · Shuyang Liu (Tsinghua university) · Xiaoyun Yuan (Tsinghua University) · Lu Fang (Tsinghua University, Tsinghua University)
Boosting Image Restoration via Priors from Pre-trained Models
Xiaogang Xu (Zhejiang Lab) · Shu Kong (University of Macau, Texas A&M University) · Tao Hu (National University of Singapore) · Zhe Liu (Zhejiang Lab) · Hujun Bao (Zhejiang University)
Online Task-Free Continual Generative and Discriminative Learning via Dynamic Cluster Memory
飞 叶 (University of York) · Adrian Bors (MBZUAI)
CPR-Coach: Recognizing Composite Error Actions based on Single-class Training
Shunli Wang (Fudan University) · Shuaibing Wang (Fudan University) · Dingkang Yang (Fudan University) · Mingcheng Li (Fudan University) · Haopeng Kuang (Fudan University) · Xiao Zhao (None) · Liuzhen Su (Fudan University) · Peng Zhai (Fudan University) · Lihua Zhang (Fudan University)
DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting
Demin Yu (Harbin Institute of Technology) · Xutao Li (Harbin Institute of Technology, Shenzhen) · Yunming Ye (Harbin Institute of Technology, Shenzhen) · Baoquan Zhang (, Harbin Institute of Technology (shenzhen)) · Luo Chuyao (None) · Kuai Dai (Harbin Institute of Technology) · wangrui (Meteorological Bureau of Shenzhen Municipality) · Chenxunlai (shenzhen Meteorological Bureau)
Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
Nobuhiko Wakai (Panasonic Holdings Corporation) · Satoshi Sato (Panasonic Holdings Corporation) · Yasunori Ishii (Panasonic Holdings Corporation) · Takayoshi Yamashita (Chubu University)
Rethinking Boundary Discontinuity Problem for Oriented Object Detection
Hang Xu (Hangzhou Dianzi University) · Xinyuan Liu (Institute of Computing Technology, Chinese Academy of Sciences) · Haonan Xu (ICT, Chinese Academy of Sciences) · Yike Ma (, Chinese Academy of Sciences) · Zunjie Zhu (Hangzhou Dianzi University) · Chenggang Yan (Hangzhou Dianzi University, Tsinghua University) · Feng Dai (ICT, Chinese Academy of Sciences)
Restoration by Generation with Constrained Priors
Zheng Ding (University of California, San Diego) · Xuaner Zhang (Adobe) · Zhuowen Tu (University of California, San Diego) · Zhihao Xia (Adobe Systems)
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim (Kyung Hee University) · Sung Jin Um (Kyung Hee University) · Sangmin Lee (University of Illinois Urbana-Champaign) · Jung Uk Kim (Kyung Hee University)
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection
Xuanyu Zhang (Peking University) · Runyi Li (Peking University) · Jiwen Yu (Peking University) · Youmin Xu (Peking University) · Weiqi Li (Peking University) · Jian Zhang (Peking University)
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve (Amazon) · Fan Fei (Amazon) · Jayakrishnan Unnikrishnan (Amazon) · Son Dinh Tran (Amazon) · Benjamin Yao (Amazon) · Belinda Zeng (Amazon) · Mubarak Shah (University of Central Florida) · Trishul Chilimbi (Department of Computer Science, University of Wisconsin - Madison)
OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
Noor Ahmed (Saarland Informatics Campus, Max-Planck Institute) · Anna Kukleva (MPII) · Bernt Schiele (Max Planck Institute for Informatics)
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
Zhi Gao (Peking University) · Yuntao Du. (Nanjing University) · Xintong Zhang (Beijing Institute for General Artificial Intelligence) · Xiaojian Ma (University of California, Los Angeles) · Wenjuan Han (Beijing Jiaotong University) · Song-Chun Zhu (UCLA) · Qing Li (Beijing Institute for General Artificial Intelligence (BIGAI))
NeuRAD: Neural Rendering for Autonomous Driving
Adam Tonderski (Lund University) · Carl Lindström (Chalmers University of Technology) · Georg Hess (Chalmers University of Technology) · William Ljungbergh (Linköping University Zenseact) · Lennart Svensson (Chalmers University of Technology) · Christoffer Petersson (Zenseact)
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Dazhong Shen (Shanghai Artificial Intelligence Laboratory) · Guanglu Song (Sensetime X-Lab) · Zeyue Xue (The University of Hong Kong) · Fu-Yun Wang (The Chinese University of Hong Kong) · Yu Liu (The Chinese University of Hong Kong)
Poly Kernel Inception Network for Remote Sensing Detection
Xinhao Cai (Nanjing University of Science and Technology) · Qiuxia Lai (Communication University of China) · Yuwei Wang (Nanjing University of Science and Technology) · Wenguan Wang (Zhejiang University) · Zeren Sun (Nanjing University of Science and Technology) · Yazhou Yao (Nanjing University of Science and Technology)
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
Mengyao Lyu (Tsinghua University) · Yuhong Yang () · Haiwen Hong (Alibaba Group) · Hui Chen (Tsinghua University, Tsinghua University) · Xuan Jin (University of Science and Technology of China) · Yuan He (Alibaba Group) · Hui Xue (Zhejiang University, Tsinghua University) · Jungong Han (Aberystwyth University) · Guiguang Ding (Tsinghua University)
Learned Lossless Image Compression based on Bit Plane Slicing
Zhe Zhang (Wuhan University) · Huairui Wang (Wuhan University) · Zhenzhong Chen (Wuhan University) · Shan Liu (Tencent Media Lab)
BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning
Hongwei Zheng (Meituan) · Linyuan Zhou (meituan) · Han Li (Shanghai Jiaotong University) · Jinming Su (Meituan) · Xiaoming Wei (Meituan) · Xu Xiaoming (meituan)
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Haithem Turki (Carnegie Mellon University) · Vasu Agrawal (Meta Reality Labs Research) · Samuel Rota Bulò (Meta) · Lorenzo Porzi (Facebook) · Peter Kontschieder (Meta) · Deva Ramanan (Carnegie Mellon University) · Michael Zollhoefer (Meta) · Christian Richardt (Meta Reality Labs)
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Ziyu Wang (Shanghai Jiao Tong University) · Yue Xu (Shanghai Jiao Tong University) · Cewu Lu (Shanghai Jiao Tong University) · Yonglu Li (Shanghai Jiaotong University)
NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows
Zhenggang Tang (UIUC) · Jason Ren (Apple) · Xiaoming Zhao (UIUC) · Bowen Wen (NVIDIA) · Jonathan Tremblay (NVIDIA) · Stan Birchfield (NVIDIA) · Alexander G. Schwing (University of Illinois Urbana-Champaign)
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
Suraj Patni (Indian Institute of Technology, Delhi) · Aradhye Agarwal (Indian Institute of Technology Delhi) · Chetan Arora (Indian Institute of Technology Delhi)
Prompt-Driven Referring Image Segmentation with Instance Contrasting
Chao Shang (None) · Zichen Song (University of Electronic Science and Technology of China) · Heqian Qiu (University of Electronic Science and Technology of China) · Lanxiao Wang (University of Electronic Science and Technology of China) · Fanman Meng (University of Electronic Science and Technology of China) · Hongliang Li (University of Electronic Science and Technology of China, Tsinghua University)
CosmicMan: A Text-to-Image Foundation Model for Humans
Shikai Li (Shanghai AI Lab) · Jianglin Fu (Shanghai AI Laboratory) · Kaiyuan Liu (None) · Wentao Wang (Shanghai AI Laboratory) · Kwan-Yee Lin (The Chinese University of Hong Kong) · Wayne Wu (None)
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen (University of Adelaide) · Yuyuan Liu (University of Adelaide) · Hu Wang (The University of Adelaide) · Fengbei Liu (Cornell University) · Chong Wang (University of Adelaide) · Helen Frazer (BreastScreen Victoria) · Gustavo Carneiro (University of Surrey)
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
Yutong Xie (University of Adelaide) · Qi Chen (The University of Adelaide) · Sinuo Wang (University of Adelaide) · Minh-Son To (Flinders University of South Australia) · Iris Lee (South Australia medical imaging) · Ee Win Khoo (The Queen Elizabeth Hospital) · Kerolos Hendy (Flinders University of South Australia) · Daniel Koh (Monash University, Malaysia Campus) · Yong Xia (Northwestern Polytechnical University) · Qi Wu (University of Adelaide)
Towards Robust Learning to Optimize with Theoretical Guarantees
Qingyu Song (The Chinese University of Hong Kong) · Wei Lin (The Chinese University of Hong Kong) · Juncheng Wang (Hong Kong Baptist University) · Hong Xu (CUHK)
Language-conditioned Detection Transformer
Jang Hyun Cho (University of Texas, Austin) · Philipp Krähenbühl (University of Texas at Austin)
Distilled Datamodel with Reverse Gradient Matching
Jingwen Ye (National University of Singapore) · Ruonan Yu (national university of singaore, National University of Singapore) · Songhua Liu (None) · Xinchao Wang (National University of Singapore)
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
Takahiro Shirakawa (Kyushu University) · Seiichi Uchida (Kyushu University)
Digital Life Project: Autonomous 3D Characters with Social Intelligence
Zhongang Cai (Nanyang Technological University) · Jianping Jiang (Peking University) · Zhongfei Qing (SenseTime Research) · Xinying Guo (Nanyang Technological University) · Mingyuan Zhang (Nanyang Technological University) · Zhengyu Lin (Sensetime) · Haiy Mei (None) · Chen Wei (SenseTime International PTE. LTD.) · Wang Ruisi (Nanyang Technological University) · Wanqi Yin (SenseTime Research ) · Liang Pan (Shanghai AI Lab) · Xiangyu Fan (Chinese University of Hong Kong) · Han Du (Universität des Saarlandes) · Peng Gao (SenseTime LTD.) · Zhitao Yang (SenseTime Co Ltd.) · Yang Gao (SenseTime) · Jiaqi Li (SenseTime) · Tianxiang Ren (Xiamen University) · YuKun Wei (Sensetime Research) · Xiaogang Wang (The Chinese University of Hong Kong) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY) · Lei Yang (The Chinese University of Hong Kong) · Ziwei Liu (Nanyang Technological University)
Object Recognition as Next Token Prediction
Kaiyu Yue (University of Maryland, College Park) · Bor-Chun Chen (Facebook) · Jonas Geiping (University of Maryland, College Park) · Hengduo Li (Meta AI) · Tom Goldstein (University of Maryland, College Park) · Ser-Nam Lim (Meta AI)
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
Yiwen Ye (Northwestern Polytechnical University) · Yutong Xie (University of Adelaide) · Jianpeng Zhang (None) · Ziyang Chen (Northwestern Polytechnical University) · Qi Wu (University of Adelaide) · Yong Xia (Northwestern Polytechnical University)
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark
Jakub Paplham (Czech Technical University in Prague) · Vojtech Franc (Czech Technical University in Prague, Faculty of Electrical Engineering)
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
Zhixuan Liu (Carnegie Mellon University) · Peter Schaldenbrand (CMU, Carnegie Mellon University) · Beverley-Claire Okogwu (CMU, Carnegie Mellon University) · Wenxuan Peng (Nanyang Technological University) · Youngsik Yun (Dongguk University) · Andrew Hundt (Carnegie Mellon University) · Jihie Kim (Dongguk University) · Jean Oh (Carnegie Mellon University)
Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning
Jiahan Li (Harbin Institute of Technology) · Jiuyang Dong (Harbin Institute of Technology) · Shenjin Huang (None) · Xi Li (Department of Gastroenterology, Shenzhen Hospital, Peking University) · Junjun Jiang (Harbin Institute of Technology) · Xiaopeng Fan (Harbin Institute of Technology) · Yongbing Zhang (Harbin Institute of Technology)
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
Razvan Pasca (None) · Alexey Gavryushin (ETHZ - ETH Zurich) · Muhammad Hamza (Department of Informatics, University of Zurich, University of Zurich) · Yen-Ling Kuo (University of Virginia, Charlottesville) · Kaichun Mo (NVIDIA Research) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Otmar Hilliges (None) · Xi Wang (None)
CoralSCOP: Segment any COral Image on this Planet
Zheng Ziqiang (Hong Kong University of Science and Technology) · Liang Haixin (None) · Binh-Son Hua (Trinity College Dublin) · Tim, Yue Him Wong (Shenzhen University) · Put ANG (The Chinese University of Hong Kong) · Apple CHUI (Chinese University of Hong Kong) · Sai-Kit Yeung (The Hong Kong University of Science and Technology (HKUST))
Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering
Suyuan Liu (National University of Defense Technology) · KE LIANG (National University of Defense Technology) · Zhibin Dong (National University of Defense Technology) · Siwei Wang (Academy of Military Sciences) · Xihong Yang (National University of Defense Technology) · sihang zhou (National University of Defense Technology) · En Zhu (National University of Defense Technology) · Xinwang Liu (National University of Defense Technology)
Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
Bhargav Ghanekar (Rice University) · Salman Siddique Khan (Rice University) · Pranav Sharma (IIT Madras, India) · Shreyas Singh (Indian Institute of Technology, Madras) · Vivek Boominathan (Rice University) · Kaushik Mitra (Indian Institute of Technology, Madras, Dhirubhai Ambani Institute Of Information and Communication Technology) · Ashok Veeraraghavan (William Marsh Rice University)
Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
Nicolae Ristea (University Politehnica of Bucharest) · Florinel Croitoru (University of Bucharest) · Radu Tudor Ionescu (None) · Marius Popescu (University of Bucharest) · Fahad Shahbaz Khan (MBZUAI; Linköping University) · Mubarak Shah (University of Central Florida)
You Only Need Less Attention Each Stage in Vision Transformers
Shuoxi Zhang (Huazhong University of Science and Technology) · Hanpeng Liu (Huazhong University of Science and Technology) · Stephen Lin (Microsoft Research) · Kun He (Huazhong University of Sceince and Technology)
Self-Calibrating Vicinal Risk Minimisation for Model Calibration
Jiawei Liu (Australian National University) · Changkun Ye (Australian National University) · Ruikai Cui (Australian National University) · Nick Barnes (Australian National University)
Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection
Chuangchuang Tan (Beijing Jiaotong University) · Huan Liu (Beijing Jiaotong University) · Yao Zhao (Beijing Jiaotong University) · Shikui Wei (Beijing jiaotong university) · Guanghua Gu (Yan Shan University) · Ping Liu (Institute of High Performance Computing, Singapore, A*STAR) · Yunchao Wei (Beijing Jiaotong University)
Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception
Haoming Chen (East China Normal Univeristy) · Zhizhong Zhang (East China Normal University) · Yanyun Qu (Xiamen University) · Ruixin Zhang (Tencent Youtu Lab) · Xin Tan (East China Normal University) · Yuan Xie (East China Normal University)
Learning from One Continuous Video Stream
Joao Carreira (DeepMind) · Michael King (Fit) · Viorica Patraucean (DeepMind) · Dilara Gokay (Google DeepMind) · Catalin Ionescu (Google) · Yi Yang (DeepMind) · Daniel Zoran (DeepMind) · Joseph Heyward (Google) · Carl Doersch (DeepMind) · Yusuf Aytar (Google DeepMind) · Dima Damen (University of Bristol and Google DeepMind) · Andrew Zisserman (University of Oxford)
MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Simon Giebenhain (Technische Universität München) · Tobias Kirschstein (Department of Informatics, Technische Universität München) · Markos Georgopoulos (Synthesia) · Martin Rünz (Synthesia) · Lourdes Agapito (University College London) · Matthias Nießner (Technical University of Munich)
CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution
Qingguo Liu (Nanjing University of Aeronautics and Astronautics) · Chenyi Zhuang (Nanjing University of Aeronautics and Astronautics) · Pan Gao (Nanjing University of Aeronautics and Astronautics, Tsinghua University) · Jie Qin (Nanjing University of Aeronautics and Astronautics)
GenesisTex: Adapting Image Denoising Diffusion to Texture Space
Chenjian Gao (None) · Boyan Jiang (Fudan University) · Xinghui Li (Tsinghua University, Tsinghua University) · YingPeng Zhang (South China University of Technology) · Qian Yu (Beihang University)
TEA: Test-time Energy Adaptation
Yige Yuan (None) · Bingbing Xu (Institute of Computing Technology, Chinese Academy of Sciences) · Liang Hou (Kuaishou Technology) · Fei Sun (Institute of Computing Technology, Chinese Academy of Sciences) · Huawei Shen (Institute of Computing Technology, Chinese Academy of Sciences) · Xueqi Cheng (, Chinese Academy of Sciences)
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Pingping Zhang (Dalian University of Technology) · Yuhao Wang (Dalian University of Technology) · Yang Liu (Dalian University of Technology) · Zhengzheng Tu (Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University) · Huchuan Lu (Dalian University of Technology)
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui (None) · Tengyu Liu (None) · Nian Liu (Beijing University of Posts and Telecommunications) · Yaodong Yang (Peking University) · Yixin Zhu (Peking University) · Siyuan Huang (Beijing Institute of General Artificial Intelligence)
$M^3$-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection
Bin Pu (Hong Kong University of Science and Technology) · Liwen Wang (Anhui University) · Jiewen Yang (Hong Kong University of Science and Technology) · He Guannan (Sichuan University) · Xingbo Dong (Anhui University) · Shengli Li (Shenzhen Maternity and Child Healthcare Hospital) · Ying Tan (Shenzhen Maternity and Child Healthcare Hospital) · Ming Chen (Harbin Red Cross Central Hospital ) · Zhe Jin (Anhui University) · Kenli Li (Hunan University) · Xiaomeng Li (The Hong Kong University of Science and Technology)
Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)
Tsu-Ching Hsiao (Woven by Toyota) · Hao-Wei Chen (National Tsing Hua University) · Hsuan-Kung Yang (National Tsinghua University) · Chun-Yi Lee (National Tsing Hua University)
QUADify: Extracting Meshes with Pixel-level Details and Materials from Images
Maximilian Frühauf (ETH Zurich & Disney Research | Studios) · Hayko Riemenschneider (Disney Research|Studios) · Markus Gross (Disney Research, Disney) · Christopher Schroers (Disney Research|Studios, Disney)
Improving Unsupervised Hierarchical Representation with Reinforcement Learning
Ruyi An (Nanyang Technological University) · Yewen Li (Nanyang Technological University) · Xu He (Huawei Technologies Ltd.) · Pengjie Gu (Nanyang Technological University) · Mengchen Zhao (South China University of Technology) · Dong Li (Huawei Technologies Ltd.) · Jianye Hao (Tianjin University) · Bo An (Nanyang Technological University) · Chaojie Wang (Skywork AI) · Mingyuan Zhou (The University of Texas at Austin)
All Rivers Run to the Sea: Private Learning with Asymmetric Flows
Yue Niu (USC) · Ramy E. Ali (Samsung) · Saurav Prakash (University of Illinois at Urbana-Champaign) · Salman Avestimehr (University of Southern California)
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
Tao Hu (Nanyang Technological University) · Fangzhou Hong (Nanyang Technological University) · Ziwei Liu (Nanyang Technological University)
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li (Peking University) · Mingxu Zhang (Beijing University of Posts and Telecommunications) · Yiran Geng (Peking University) · Haoran Geng (Peking University) · Yuxing Long (Beijing University of Posts and Telecommunications) · Yan Shen (Peking University) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Jiaming Liu (Peking University) · Hao Dong (None)
LASO: Language-guided Affordance Segmentation on 3D Object
Yicong Li (national university of singaore, National University of Singapore) · Na Zhao (Singapore University of Technology and Design) · Junbin Xiao (None) · Chun Feng (University of Science and Technology of China) · Xiang Wang (University of Science and Technology of China) · Tat-seng Chua (National University of Singapore)
Dispersed Structured Light for Hyperspectral 3D Imaging
Suhyun Shin (Pohang University of Science and Technology) · Seokjun Choi (Pohang University of Science and Technology) · Felix Heide (Department of Computer Science, Princeton University) · Seung-Hwan Baek (POSTECH)
DualAD: Disentangling the Dynamic and Static World for End-to-End Driving
Simon Doll (Eberhard-Karls-Universität Tübingen) · Niklas Hanselmann (Mercedes Benz Research & Development) · Lukas Schneider (Mercedes Benz Research & Development) · Richard Schulz (Mercedes Benz AG) · Marius Cordts (Mercedes-Benz) · Markus Enzweiler (Esslingen University of Applied Sciences) · Hendrik Lensch (University of Tübingen)
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
Takami Sato (None) · Justin Yue (University of California, Irvine) · Nanze Chen (University of Cambridge) · Ningfei Wang (University of California, Irvine) · Alfred Chen (University of California, Irvine)
FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation
Chris Rockwell (University of Michigan) · Nilesh Kulkarni (None) · Linyi Jin (None) · Jeong Joon Park (Stanford University) · Justin Johnson (University of Michigan) · David Fouhey (New York University)
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Chenshuang Zhang (Korea Advanced Institute of Science and Technology) · Fei Pan (University of Michigan - Ann Arbor) · Junmo Kim (Korea Advanced Institute of Science and Technology) · In So Kweon (Korea Advanced Institute of Science and Technology) · Chengzhi Mao (Columbia University)
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
Yuxuan Zhou (University of Mannheim) · Xudong Yan (City University of Macau) · Zhi-Qi Cheng (Carnegie Mellon University) · Yan Yan (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Qi Dai (Microsoft Research Asia) · Xian-Sheng Hua (Terminus Group)
Language-only Training of Zero-shot Composed Image Retrieval
Geonmo Gu (NAVER) · Sanghyuk Chun (NAVER AI Lab) · Wonjae Kim (NAVER) · Yoohoon Kang (NAVER) · Sangdoo Yun (NAVER)
Any-Shift Prompting for Generalization over Distributions
Zehao Xiao (University of Amsterdam) · Jiayi Shen (University of Amsterdam) · Mohammad Mahdi Derakhshani (University of Amsterdam) · Shengcai Liao (Inception Institute of Artificial Intelligence) · Cees G. M. Snoek (University of Amsterdam)
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
Kumaranage Ravindu Nagasinghe (Mohamed bin Zayed University of Artificial Intelligence) · Honglu Zhou (Rutgers University) · Malitha Gunawardhana (University of Auckland) · Martin Renqiang Min (NEC Laboratories America) · Daniel Harari (Weizmann Institute of Science) · Muhammad Haris Khan (None)
Time-, Memory- and Parameter-Efficient Visual Adaptation
Otniel-Bogdan Mercea (University of Tübingen) · Alexey Gritsenko (Google) · Cordelia Schmid (Inria / Google) · Anurag Arnab (Google)
Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion
Su Sun (Purdue University) · Henry Zhao (Bosch Research) · Yuliang Guo (Bosch US Research) · Ruoyu Wang (Bosch) · Xinyu Huang (Robert Bosch Research NA) · Yingjie Victor Chen (Purdue University) · Liu Ren (Bosch Research)
Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Ke Fan (Fudan University) · Zechen Bai (Show Lab, National University of Singapore) · Tianjun Xiao (Amazon) · Tong He (Amazon Web Services) · Max Horn (GSK plc) · Yanwei Fu (Fudan University) · Francesco Locatello (ISTA) · Zheng Zhang (New York University)
Perceptual-Oriented Video Frame Interpolation Via Asymmetric Synergistic Blending
Guangyang Wu (Shanghai Jiaotong University) · Xin Tao (Kuaishou) · Changlin Li (SeeKoo) · Wenyi Wang (University of Electronic Science and Technology of China) · Xiaohong Liu (Shanghai Jiao Tong University) · Qingqing Zheng ()
Exact Fusion via Feature Distribution Matching for Few-shot Image Generation
Yingbo Zhou (East China Normal University) · Yutong Ye (None) · Pengyu Zhang (East China Normal University) · Xian Wei (Chinese Academy of Sciences) · Mingsong Chen (East China Normal University)
iToF-flow-based High Frame Rate Depth Imaging
Yu Meng (Nanjing University) · Zhou Xue (Li Auto) · Xu Chang (Bytedance Inc) · Xuemei Hu (Nanjing University) · Tao Yue (Nanjing University)
Revisiting Counterfactual Problems in Referring Expression Comprehension
Zhihan Yu (Beijing University of Posts and Telecommunications) · Ruifan Li (Beijing University of Post and Telecommunication)
Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names
Yapeng Li (Wuhan University) · Yong Luo (Wuhan University) · Zengmao Wang (Wuhan University) · Bo Du (Wuhan University)
Continual Motion Prediction Learning Framework via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy
Dae Jun Kang (None) · Dongsuk Kum (Korea Advanced Institute of Science and Technology) · Sanmin Kim (KAIST)
Detector-Free Structure from Motion
Xingyi He (Zhejiang University) · Jiaming Sun (Image Derivative Inc.) · Yifan Wang (Zhejiang University) · Sida Peng (None) · Qixing Huang (University of Texas at Austin) · Hujun Bao (Zhejiang University) · Xiaowei Zhou (None)
DiVAS: Video and Audio Synchronization with Dynamic Frame Rates
Clara Maria Fernandez Labrador (Disney Research) · Mertcan Akcay (Disney Research) · Eitan Abecassis (Walt Disney Company) · Joan Massich (Disney Research) · Christopher Schroers (Disney Research|Studios, Disney)
Material Palette: Extraction of Materials from a Single Image
Ivan Lopes (INRIA) · Fabio Pizzati (University of Oxford) · Raoul de Charette (Inria)
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos
Chen Liu (The University of Queensland) · Peike Li (Futureverse AI) · Qingtao Yu (Australian National University) · Hongwei Sheng (University of Queensland) · Dadong Wang (CSIRO) · Lincheng Li () · Xin Yu (University of Queensland)
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
Yijun Yang (University of Technology Sydney) · Tianyi Zhou (University of Maryland, College Park) · kanxue Li (Yunnan University) · Dapeng Tao (Yunnan University) · Lusong Li (JDT) · Li Shen (JD Explore Academy) · Xiaodong He (JD AI Research) · Jing Jiang (University of Technology Sydney) · Yuhui Shi (Southern University of Science and Technology)
Towards Accurate and Robust Architectures via Neural Architecture Search
Yuwei Ou (Sichuan University) · Yuqi Feng (Sichuan University) · Yanan Sun (Sichuan University)
C$^\text{2}$RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction
Yiqun Lin (The Hong Kong University of Science and Technology) · Jiewen Yang (Hong Kong University of Science and Technology) · hualiang wang (HKUST) · Xinpeng Ding (The Hong Kong University of Science and Technology) · Wei Zhao (Beijing University of Aeronautics and Astronautics) · Xiaomeng Li (The Hong Kong University of Science and Technology)
Open-Vocabulary Video Anomaly Detection
Peng Wu (Northwest Polytechnical University Xi'an) · Xuerong Zhou (Northwest Polytechnical University Xi'an) · Guansong Pang (Singapore Management University) · Yujia Sun (Xi'an University of Electronic Science and Technology) · Jing Liu (Guangzhou Institute of Technology, Xidian University) · Peng Wang (Northwestern Polytechnical University) · Yanning Zhang (Northwestern Polytechnical University)
Language Model Guided Interpretable Video Action Reasoning
Ning Wang (xidian university) · Guangming Zhu (Xidian University) · Hongsheng Li (Xi'an University of Electronic Science and Technology) · Liang Zhang (Xidian University) · Syed Afaq Ali Shah (Edith Cowan University) · Mohammed Bennamoun (University of Western Australia)
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
Vu Minh Hieu Phan (University of Adelaide) · Yutong Xie (University of Adelaide) · Yuankai Qi (The University of Adelaide) · Lingqiao Liu (None) · Liyang Liu (University of Adelaide) · Bowen Zhang (The University of Adelaide) · Zhibin Liao (University of Adelaide) · Qi Wu (University of Adelaide) · Minh-Son To (Flinders University of South Australia) · Johan Verjans (University of Adelaide)
Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields
Leili Goli (University of Toronto) · Cody Reading (Simon Fraser University) · Silvia Sellán (University of Toronto) · Alec Jacobson (University of Toronto and Adobe Systems) · Andrea Tagliasacchi (Simon Fraser University, Google Brain)
LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
Yunshi HUANG (École de technologie supérieure, Université du Québec) · Fereshteh Shakeri (École de technologie supérieure) · Jose Dolz (École de technologie supérieure) · Malik Boudiaf (École de technologie supérieure) · Houda Bahig (University of Montreal) · Ismail Ben Ayed (ETS Montreal)
Building Optimal Neural Architectures using Interpretable Knowledge
Keith Mills (University of Alberta) · Fred Han (Huawei Technologies Ltd.) · Mohammad Salameh (Huawei Technologies Canada Ltd.) · Shengyao Lu (University of Alberta) · CHUNHUA ZHOU (Huawei Technologies Ltd.) · Jiao He (huawei) · Fengyu Sun (Tongji University) · Di Niu (University of Alberta)
IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM
Minghao Yin (The University of Hong Kong) · Shangzhe Wu (Stanford University) · Kai Han (The University of Hong Kong)
Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping
Alex Costanzino (University of Bologna) · Pierluigi Zama Ramirez (University of Bologna) · Giuseppe Lisanti (University of Bologna) · Luigi Di Stefano (University of Bologna)
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Yuelang Xu (Tsinghua University, Tsinghua University) · Benwang Chen (Tsinghua University, Tsinghua University) · Zhe Li (Tsinghua University) · Hongwen Zhang (Beijing Normal University) · Lizhen Wang (Tsinghua University, Tsinghua University) · Zerong Zheng (Tsinghua University) · Yebin Liu (Tsinghua University)
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
Jun Wang (University of California, San Diego) · Yuzhe Qin (University of California, San Diego, University of California, San Diego) · Kaiming Kuang (University of California, San Diego) · Yigit Korkmaz (University of Southern California) · Akhilan Gurumoorthy (University of California, San Diego) · Hao Su (UCSD) · Xiaolong Wang (UCSD)
Improving Plasticity in Online Continual Learning via Collaborative Learning
Maorong Wang (The University of Tokyo) · Nicolas Michel (None) · Ling Xiao (None) · Toshihiko Yamasaki (None)
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
Xudong Wang (Electrical Engineering & Computer Science Department, University of California Berkeley) · Ishan Misra (Facebook) · Ziyun Zeng (UCB) · Rohit Girdhar (Meta) · Trevor Darrell (Electrical Engineering & Computer Science Department)
HIT: Estimating Internal Human Implicit Tissues from the Body Surface
Marilyn Keller (Max Planck Institute for Inteligent Systems) · Vaibhav ARORA (INRIA) · Abdelmouttaleb Dakri (None) · Shivam Chandhok (INRIA) · Jürgen Machann (Institute for Diabetes Research and Metabolic Diseases, Helmholtz Center Munich at the University of Tuebingen) · Andreas Fritsche (Eberhard-Karls-Universität Tübingen) · Michael J. Black (University of Tübingen) · Sergi Pujades (INRIA)
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
Jia-Wei Liu (National University of Singapore) · Yan-Pei Cao (Tencent ARC Lab) · Jay Zhangjie Wu (National University of Singapore) · Weijia Mao (NUS) · Yuchao Gu (None) · Rui Zhao (None) · Jussi Keppo (National University of Singapore) · Ying Shan (Tencent) · Mike Zheng Shou (National University of Singapore)
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu (Chinese University of Hong Kong (Shenzhen)) · Yingwei Pan (HiDream.ai) · Yehao Li (HiDream.ai) · Ting Yao (JD AI Research) · Zhenglong Sun (The Chinese University of Hong Kong, Shenzhen) · Tao Mei (JD Explore Academy) · Chang-Wen Chen (The Hong Kong Polytechnic University)
Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios
Jie Xu (University of Electronic Science and Technology of China) · Yazhou Ren (University of Electronic Science and Technology of China) · Xiaolong Wang (University of Electronic Science and Technology of China) · Lei Feng (Nanyang Technological University) · Zheng Zhang (Harbin Institute of Technology) · Gang Niu (RIKEN) · Xiaofeng Zhu (University of Electronic Science and Technology of China)
Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations
Rui Zhao (None) · Ruiqin Xiong (Peking University) · Jing Zhao (cncert) · Jian Zhang (Peking University) · Xiaopeng Fan (Harbin Institute of Technology) · Zhaofei Yu (Peking University) · Tiejun Huang (Peking University)
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang (Hefei University of Technology) · Dan Guo (Hefei University of Technology) · Kun Li (Hefei University of Technology) · Zhun Zhong (University of Nottingham) · Meng Wang (Hefei University of Technology)
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Yuqing Wen (University of Science and Technology of China) · Yucheng Zhao (University of Science and Technology of China) · Yingfei Liu (Megvii Technology Inc.) · Fan Jia (Megvii Technology Inc.) · Yanhui Wang (None) · Chong Luo (Microsoft Research Asia) · Chi Zhang (Columbia University) · Tiancai Wang (Megvii Technology Inc.) · Xiaoyan Sun (University of Science and Technology of China) · Xiangyu Zhang (MEGVII Technology)
What Moves Together Belongs Together
Jenny Seidenschwarz (Department of Informatics, Technische Universität München) · Aljoša Ošep (Carnegie Mellon University) · Francesco Ferroni () · Simon Lucey (University of Adelaide) · Laura Leal-Taixe (NVIDIA)
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
Jin Yang (Xi'an jiao tong university) · Ping Wei (None) · Huan Li (Xi'an Jiaotong University) · Ziyang Ren (Xi'an Jiaotong University)
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi (Google) · Skanda Koppula (Google Deepmind) · Shreya Pathak (Google) · Justin Chiu (Google) · Joseph Heyward (Google) · Viorica Patraucean (DeepMind) · Jiajun Shen (DeepMind) · Antoine Miech (DeepMind) · Andrew Zisserman (University of Oxford) · Aida Nematzadeh (Google Deepmind)
NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
Yufei Han (None) · Heng Guo (Beijing University of Posts and Telecommunications) · Koki Fukai (Osaka University) · Hiroaki Santo (Osaka University) · Boxin Shi (Peking University) · Fumio Okura (Osaka University) · Zhanyu Ma (Beijing University of Post and Telecommunication) · Yunpeng Jia (Beijing University of Posts and Telecommunications)
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Fei Deng (Rutgers University Google) · Qifei Wang (Google) · Wei Wei (Google) · Tingbo Hou (Google Research) · Matthias Grundmann (Google)
NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
Minh-Tuan Tran (Monash University) · Trung Le (Monash University) · Xuan-May Le (University of Melbourne) · Mehrtash Harandi (Monash University) · Quan Tran (servicenow) · Dinh Phung (Monash University)
Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation
Yeonguk Yu (Gwangju Institute of Science and Technology) · Sungho Shin (None) · Seunghyeok Back (Gwangju Institute of Science and Technology) · Minhwan Ko (Gwangju Institute of Science and Technology) · Sangjun Noh (Gwangju Institute of Science and Technology) · Kyoobin Lee (None)
LEDITS++: Limitless Image Editing using Text-to-Image Models
Manuel Brack (Technische Universität Darmstadt) · Felix Friedrich (TU Darmstadt, Hessian.AI) · Katharina Kornmeier (Align Technology) · Linoy Tsaban (Hugging Face) · Patrick Schramowski (TU Darmstadt) · Kristian Kersting (TU Darmstadt) · Apolinário Passos (Universidade de Brasília)
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
Yun Liu (Tsinghua University) · Haolin Yang (Beijing University of Posts and Telecommunications) · Xu Si (Tsinghua University) · Ling Liu (Beijing Institute of Technology) · Zipeng Li (Tsinghua University, Tsinghua University) · Yuxiang Zhang (Tsinghua University, Tsinghua University) · Yebin Liu (Tsinghua University) · Li Yi ()
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
Rongjie Li (SIST ,ShanghaiTech University) · Yu Wu (ShanghaiTech University) · Xuming He (ShanghaiTech University)
The Neglected Tails of Vision-Language Models
Shubham Parashar (Texas A&M University - College Station) · Tian Liu (Texas A&M University - College Station) · Zhiqiu Lin (Carnegie Mellon University) · Xiangjue Dong (Texas A&M University - College Station) · Yanan Li (Zhejiang Lab) · James Caverlee (Texas A&M University) · Deva Ramanan (Carnegie Mellon University) · Shu Kong (University of Macau, Texas A&M University)
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Xianpeng Liu (North Carolina State University) · Ce Zheng (University of Central Florida) · Ming Qian (None) · Nan Xue (Ant Group) · Chen Chen () · Zhebin Zhang (OPPO) · Chen Li (Innopeak Technology Inc.) · Tianfu Wu ()
SODA: Bottleneck Diffusion Models for Representation Learning
Drew Hudson (Google DeepMind) · Daniel Zoran (DeepMind) · Mateusz Malinowski (MoonValley AI) · Andrew Lampinen (Google DeepMind) · Andrew Jaegle (Google DeepMind) · James McClelland (Stanford University and Google DeepMind) · Loic Matthey (DeepMind) · Felix Hill (Google) · Alexander Lerchner (Google DeepMind)
Scaling Up Dynamic 3D Human-Scene Interaction Modelling
Nan Jiang (Peking University) · Zhiyuan Zhang (Department of Automation, Tsinghua University) · Hongjie Li (Peking University) · Xiaoxuan Ma (Peking University) · Zan Wang (Beijing Institute of Technology) · Yixin Chen (BIGAI) · Tengyu Liu (None) · Yixin Zhu (Peking University) · Siyuan Huang (Beijing Institute of General Artificial Intelligence)
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
Zikai Xiao (None) · Guo-Ye Yang (None) · Xue Yang (Shanghai AI Laboratory) · Tai-Jiang Mu (Tsinghua University, Tsinghua University) · Junchi Yan (Shanghai Jiao Tong University) · Shi-Min Hu (Tsinghua University, Tsinghua University)
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
Sadeep Jayasumana (Google) · Daniel Glasner (Google) · Srikumar Ramalingam (Google) · Andreas Veit (Google) · Ayan Chakrabarti (Google) · Sanjiv Kumar (Google)
Data-Free Quantization via Pseudo-label Filtering
Chunxiao Fan (Hefei University of Technology) · Ziqi Wang (Hefei University of Technology) · Dan Guo (Hefei University of Technology) · Meng Wang (Hefei University of Technology)
Fitting Flats to Flats
Gabriel Dogadov (Technische Universität Berlin) · Ugo Finnendahl (Technische Universität Berlin) · Marc Alexa (TU Berlin)
Bayesian Diffusion Models for 3D Shape Reconstruction
Haiyang Xu (University of Science and Technology of China) · Yu lei (Shanghai Jiao Tong University) · Zeyuan Chen (University of California, San Diego) · Xiang Zhang (University of California, San Diego) · Yue Zhao (Tsinghua University ) · Yilin Wang (Tsinghua University, Tsinghua University) · Zhuowen Tu (University of California, San Diego)
HOIST-Former: Hand-held Objects Identification, Segmentation, and Tracking in the Wild
Supreeth Narasimhaswamy (Stony Brook University, New York) · Huy Anh Nguyen (Stony Brook University) · Lihan Huang (University of Science and Technology of China) · Minh Hoai (State University of New York, Stony Brook)
Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation
Yuan Xiao (Nanjing University) · Shiqing Ma (University of Massachusetts at Amherst) · Juan Zhai (University of Massachusetts at Amherst) · Chunrong Fang (Nanjing University) · Jinyuan Jia (Pennsylvania State University) · Zhenyu Chen (nanjing university)
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
Rashindrie Perera (University of Melbourne) · Saman Halgamuge (University of Melbourne)
MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding
Xu Cao (University of Illinois Urbana-Champaign) · Tong Zhou (Tencent AI Lab) · Yunsheng Ma (Purdue University) · Wenqian Ye (University of Virginia) · Can Cui (Purdue University) · Kun Tang (Tencent) · Zhipeng Cao (Tencent) · Kaizhao Liang (University of Texas at Austin) · Ziran Wang (Purdue University) · James Rehg (None) · chao zheng (tencent)
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee (Korea University) · Joonmyung Choi (Korea University) · Hyunwoo J. Kim (Korea University)
Generative Multi-modal Models are Good Class Incremental Learners
Xusheng Cao (Nankai University) · Haori Lu (Nankai University) · Linlan Huang (Nankai University) · Xialei Liu (Nankai University) · Ming-Ming Cheng (Nankai University, Tsinghua University)
WaveMo: Learning Wavefront Modulations to See Through Scattering
Mingyang Xie (University of Maryland, College Park) · Haiyun Guo (Rice University) · Brandon Y. Feng (Massachusetts Institute of Technology) · Lingbo Jin (Rice University) · Ashok Veeraraghavan (William Marsh Rice University) · Christopher Metzler (University of Maryland, College Park)
DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences, Neuron Visualisations, and Visual Counterfactual Explanations
Maximilian Augustin (University of Tuebingen) · Yannic Neuhaus (Eberhard-Karls-Universität Tübingen) · Matthias Hein (University of Tübingen)
Backpropagation-free Network for 3D Test-time Adaptation
YANSHUO WANG (CSIRO) · Ali Cheraghian (CSIRO) · Zeeshan Hayder (CSIRO) · JIE HONG (Australian National University) · Sameera Ramasinghe (Amazon) · Shafin Rahman (North South University) · David Ahmedt-Aristizabal (CSIRO) · Xuesong Li (Australian National University) · Lars Petersson (CSIRO) · Mehrtash Harandi (Monash University)
Finsler-Laplace-Beltrami Operators with Application to Shape Analysis
Simon Weber (Technische Universität München) · Thomas Dagès (Technion - Israel Institute of Technology) · Maolin Gao (None) · Daniel Cremers (Technical University Munich)
NeRFiller: Completing Scenes via Generative 3D Inpainting
Ethan Weber (University of California Berkeley) · Aleksander Holynski (UC Berkeley & Google Research) · Varun Jampani (Google Research) · Saurabh Saxena (None) · Noah Snavely (Google / Cornell) · Abhishek Kar (Google) · Angjoo Kanazawa (UC Berkeley)
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim (Beeble Inc.) · Minje Jang (Beeble Inc.) · Wonjun Yoon (Beeble Inc.) · Jisoo Lee (Beeble Inc.) · Donghyun Na (Beeble Inc.) · Sanghyun Woo (New York University)
Sparse Semi-Detr: Sparse Learnable Queries for Semi-Supervised Object Detection
Tahira Shehzadi () · Khurram Azeem Hashmi (DFKI - German Research Center for AI) · Didier Stricker (Universität Kaiserslautern) · Muhammad Zeshan Afzal (German Research Center for AI)
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng (None) · Zhicheng Guo (Tsinghua University, Tsinghua University) · Jingwen Wu (University of Toronto) · Kechen Fang (Tsinghua University) · Peng Li (Tsinghua University) · Huaping Liu (Tsinghua University, Tsinghua University) · Yang Liu (Tsinghua University)
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
Jiazhou Zhou (Hong Kong University of Science and Technology) · Xu Zheng (HKUST) · Yuanhuiyi Lyu (Hong Kong University of Science and Technology (Guangzhou)) · Lin Wang (Hong Kong University of Science and Technology)
4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations
Wenbo Wang (ETHZ - ETH Zurich) · Hsuan-I Ho (ETHZ - ETH Zurich) · Chen Guo (ETH Zurich) · Boxiang Rong (ETHZ - ETH Zurich) · Artur Grigorev () · Jie Song (ETHZ - ETH Zurich) · Juan Jose Zarate (Department of Computer Science, ETHZ - ETH Zurich) · Otmar Hilliges (None)
Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer
Wenqiao Zhang (National University of Singapore) · Zheqi Lv (Zhejiang University)
Weak-to-Strong 3D Object Detection with X-Ray Distillation
Alexander Gambashidze (AIRI) · Aleksandr Dadukin (Higher School of Economics) · Maksim Golyadkin (AIRI) · Maria Razzhivina (Higher School of Economics, Higher School of Economics) · Ilya Makarov (Moscow State Institute of Steel and Alloys)
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
Karren Yang (Apple) · Anurag Ranjan (Apple) · Jen-Hao Rick Chang (Apple) · Raviteja Vemulapalli (None) · Oncel Tuzel (Apple)
YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection
Alon Zolfi (Ben-Gurion University of the Negev) · Guy AmiT (Ben-Gurion University of the Negev) · Amit Baras () · Satoru Koda (Fujitsu Limited) · Ikuya Morikawa (Fujitsu Research) · Yuval Elovici (Ben Gurion University of the Negev) · Asaf Shabtai (Ben-Gurion University of the Negev)
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
Jianqing Zhang (Shanghai Jiao Tong University & Tsinghua University) · Yang Liu (Tsinghua University, Tsinghua University) · Yang Hua (Queen's University Belfast) · Jian Cao (Shanghai Jiaotong University)
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
CONG MA (Senseauto Research) · Qiao Lei (SenseAuto Research) · Chengkai Zhu (SenseAuto Research) · Kai Liu (SenseAuto Research) · Zelong Kong (SenseAuto Research) · Liqing (SenseAuto) · Xueqi Zhou (Beijing Sensetime Technology Development Co., Ltd.) · Yuheng KAN (Zhejiang University) · Wei Wu (Tsinghua University, Tsinghua University)
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
song yiran (None) · Qianyu Zhou (Shanghai Jiao Tong University) · Xiangtai Li (Nanyang Technological University) · Deng-Ping Fan (ETH Zurich) · Xuequan Lu (La Trobe University) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze Estimation
Yiwei Bao (Beihang University) · Feng Lu (Beihang University, Tsinghua University)
NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation
Ziyi Chen (Zhejiang University) · Xiaolong Wu (Georgia Institute of Technology) · Yu Zhang (Zhejiang University)
FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning
Gihun Lee (KAIST AI) · Minchan Jeong (Korea Advanced Institute of Science and Technology) · SangMook Kim (KAIST) · Jaehoon Oh (Samsung Advanced Institute of Technology) · Se-Young Yun (KAIST)
Learning Multi-dimensional Human Preference for Text-to-Image Generation
Sixian Zhang (None) · Bohan Wang (Kuaishou) · Junqiang Wu (Kuaishou) · Yan Li (kuaishou) · Tingting Gao (China Agricultural University) · Di ZHANG (Kuaishou Technology) · Zhongyuan Wang (Kuaishou Inc.)
Improved Visual Grounding through Self-Consistent Explanations
Ruozhen He (Rice University) · Paola Cascante-Bonilla (Rice University) · Ziyan Yang (Rice University) · Alex Berg (None) · Vicente Ordonez (Rice University)
Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
Xiaoyang Chen (University of Pennsylvania, University of Pennsylvania) · Hao Zheng (University of Pennsylvania, University of Pennsylvania) · Yuemeng LI (University of Pennsylvania) · Yuncong Ma (University of Pennsylvania, University of Pennsylvania) · Liang Ma (University of Pennsylvania, University of Pennsylvania) · Hongming Li (University of Pennsylvania, University of Pennsylvania) · Yong Fan (University of Pennsylvania)
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Hao Li (The Chinese University of Hong Kong) · Xue Yang (Shanghai AI Laboratory) · Zhaokai Wang (Shanghai Jiao Tong University) · Xizhou Zhu (Shanghai AI Laboratory) · Jie Zhou (None) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Xiaogang Wang (The Chinese University of Hong Kong) · Hongsheng Li (The Chinese University of Hong Kong) · Lewei Lu (SenseTime) · Jifeng Dai (Tsinghua University, Tsinghua University)
OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos
Dongyoung Choi (Korea Advanced Institute of Science and Technology) · Hyeonjoong Jang (None) · Min H. Kim (KAIST)
Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
Alexander Raistrick (Princeton University) · Lingjie Mei (Princeton University) · Karhan Kayan (Princeton University) · David Yan (Princeton University) · Yiming Zuo (Princeton University) · Beining Han (Department of Computer Science, Princeton University) · Hongyu Wen (Princeton University) · Meenal Parakh (Princeton University) · Stamatis Alexandropoulos (Princeton University) · Lahav Lipson (Princeton University) · Zeyu Ma (Princeton university) · Jia Deng (Princeton University)
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li (Johns Hopkins University) · Bhavan Jasani (Amazon) · Peng Tang (Amazon) · Shabnam Ghadar (Amazon)
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
Xinpeng Ding (The Hong Kong University of Science and Technology) · Jianhua Han (Huawei Technologies Ltd.) · Hang Xu (Huawei Noah‘s Ark Lab) · Xiaodan Liang (Sun Yat-sen University) · Wei Zhang (Huawei Technologies Ltd.) · Xiaomeng Li (The Hong Kong University of Science and Technology)
Reconstructing Hands in 3D with Transformers
Georgios Pavlakos (University of Texas at Austin) · Dandan Shan (None) · Ilija Radosavovic () · Angjoo Kanazawa (UC Berkeley) · David Fouhey (New York University) · Jitendra Malik (University of California at Berkeley)
Systematic comparison of semi-supervised and self-supervised learning for medical image classification
Zhe Huang (Tufts University) · Ruijie Jiang (Tufts University) · Shuchin Aeron (Tufts University) · Michael C. Hughes (Tufts University)
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He (None) · Hengduo Li (Meta AI) · Young Kyun Jang (Meta AI) · Menglin Jia (Facebook) · Xuefei Cao (Meta) · Ashish Shah (Meta) · Abhinav Shrivastava (University of Maryland) · Ser-Nam Lim (Meta AI)
Hierarchical Correlation Clustering and Tree Preserving Embedding
Morteza Haghir Chehreghani (Chalmers University of technology) · Mostafa Haghir Chehreghani (Amirkabir University of Technology)
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Minghua Liu (University of California, San Diego) · Ruoxi Shi (University of California, San Diego) · Linghao Chen (None) · Zhuoyang Zhang (IIIS, Tsinghua University) · Chao Xu (University of California, Los Angeles) · Xinyue Wei (University of California, San Diego) · Hansheng Chen (Stanford University) · Chong Zeng (Zhejiang University) · Jiayuan Gu (University of California, San Diego) · Hao Su (UCSD)
C$^2$KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation
Fushuo Huo (Hong Kong Polytechnic University) · Wenchao Xu (The Hong Kong Polytechnic University) · Jingcai Guo (The Hong Kong Polytechnic University) · Haozhao Wang (Huazhong University of Science and Technology) · Song Guo (Department of Computer Science and Engineering, Hong Kong University of Science and Technology)
Distilling Vision-Language Models on Millions of Videos
Yue Zhao (UT Austin) · Long Zhao (Google DeepMind) · Xingyi Zhou (Google) · Jialin Wu (Google) · Chun-Te Chu (Google Research) · Hui Miao (Google) · Florian Schroff (Google) · Hartwig Adam (Google Research) · Ting Liu (Google Research) · Boqing Gong (Google) · Philipp Krähenbühl (University of Texas at Austin) · Liangzhe Yuan (Google)
SNI-SLAM: Semantic Neural Implicit SLAM
Siting Zhu (Shanghai Jiao Tong University) · Guangming Wang (University of Cambridge) · Hermann Blum (Computer Vision and Geometry Lab, ETH Zürich) · Jiuming Liu (Shanghai Jiao Tong University) · LiangSong (China University of Mining Technology - Xuzhou) · Marc Pollefeys (ETH Zurich / Microsoft) · Hesheng Wang (Shanghai Jiao Tong University)
PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought
Junyi Yao (None) · Yijiang Liu (Nanjing University) · Zhen Dong (PhD/Postdoc UC Berkeley) · Mingfei Guo (Stanford University) · Helan Hu (Peking University) · Kurt Keutzer (EECS, UC Berkeley) · Li Du (Nanjing University) · Daquan Zhou (National University of Singapore) · Shanghang Zhang (Peking University)
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi (National University of Singapore) · Zehong Yan (National University of Singapore) · Wynne Hsu (National University of Singapore) · Mong Li Lee (National University of Singapore)
Spatial-Aware Regression for Keypoint Localization
Dongkai Wang (Peking University) · Shiliang Zhang (Peking University)
Shadow-Enlightened Image Outpainting
Hang Yu (Shanghai University) · Ruilin Li (Shanghai University) · Shaorong Xie (Shanghai University) · Jiayan Qiu (Univerisity of Leicester)
Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation
Jiapeng Su (Harbin Institute of Technology) · Qi Fan (The Hong Kong University of Science and Technology) · Wenjie Pei (Harbin Institute of Technology) · Guangming Lu (Harbin Institute of Technology, Shenzhen) · Fanglin Chen (Harbin Institute of Technology (Shenzhen))
Few-Shot Object Detection with Foundation Models
Guangxing Han (Columbia University) · Ser-Nam Lim (Meta AI)
LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes
Yanwen Guo (Nanjing University) · Yuanqi Li (Nanjing University) · Dayong Ren (nanjing university) · Xiaohong Zhang () · Jiawei Li (Nanjing University) · Liang Pu (None) · Changfeng Ma (Nanjing University) · xiaoyu zhan (Nanjing University) · Jie Guo (Nanjing University) · Mingqiang Wei (Nanjing University of Aeronautics and Astronautics) · Yan Zhang (None) · Piaopiao Yu (Nanjing University) · Shuangyu Yang (Nanjing University) · Donghao Ji (nanjing university) · Huisheng Ye (Nanjing University) · Hao Sun (nanjing university) · Yansong Liu (nanjing university) · Yinuo Chen (Nanjing University) · Jiaqi Zhu (nanjing university) · Hongyu Liu (nanjing university)
CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation
Townim Chowdhury (None) · Kewen Liao (Australian Catholic University) · Vu Minh Hieu Phan (University of Adelaide) · Minh-Son To (Flinders University of South Australia) · Yutong Xie (University of Adelaide) · Kevin Hung (Royal Adelaide Hospital) · David Ross (University of South Australia) · Anton van den Hengel (University of Adelaide) · Johan Verjans (University of Adelaide) · Zhibin Liao (University of Adelaide)
PREGO: online mistake detection in PRocedural EGOcentric videos
Alessandro Flaborea (Sapienza University of Rome / ItalAI) · Guido M. D'Amely di Melendugno (University of Roma "La Sapienza") · Leonardo Plini (Sapienza University of Rome & INFN) · Luca Scofano (University of Roma "La Sapienza") · Edoardo De Matteis (Sapienza University) · Antonino Furnari (University of Catania) · Giovanni Maria Farinella (University of Catania, Italy) · Fabio Galasso (None)
M&M VTO: Multi-Garment Virtual Try-On and Editing
Luyang Zhu (Department of Computer Science, University of Washington) · Yingwei Li (Google) · Nan Liu (Google) · Hao Peng (Google) · Dawei Yang (Google Inc.) · Ira Kemelmacher-Shlizerman (UW + Google)
Cross-view and Cross-pose Completion for 3D Human Understanding
Matthieu Armando (Naver Labs Europe) · Salma Galaaoui (Naver Labs Europe) · Fabien Baradel (NAVER LABS Europe) · Thomas Lucas (Naver Labs Europe) · Vincent Leroy (Naver Labs Europe) · Romain BRÉGIER (None) · Philippe Weinzaepfel (Naver Labs Europe) · Grégory Rogez (Naver Labs Europe)
Task-Aware Encoder Control for Deep Video Compression
Xingtong Ge (Beijing Institute of Technology) · Jixiang Luo (sensetime) · XINJIE ZHANG (The Hong Kong University of Science and Technology) · Tongda Xu (Tsinghua University) · Guo Lu (Shanghai Jiaotong University) · Dailan He (The Chinese University of Hong Kong) · Jing Geng (Beijing Institute of Technology) · Yan Wang (Tsinghua University, Tsinghua University) · Jun Zhang (The Hong Kong University of Science and Technology) · Hongwei Qin (SenseTime Co.)
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Yibo Wang (Tsinghua University) · Ruiyuan Gao (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Kai Chen (The Hong Kong University of Science and Technology) · Kaiqiang Zhou (Huawei Technologies Ltd.) · Yingjie CAI (The Chinese University of Hong Kong) · Lanqing Hong (Huawei Technologies Ltd.) · Zhenguo Li (Huawei) · Lihui Jiang (Huawei Technologies Ltd.) · Dit-Yan Yeung (Hong Kong University of Science and Technology) · Qiang Xu (The Chinese University of Hong Kong) · Kai Zhang (Shenzhen International Graduate School, Tsinghua University)
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
Yu Zhang (Shanghai Jiaotong University) · Songpengcheng Xia () · Lei Chu (University of Southern California) · Jiarui Yang (Shanghai Jiaotong University) · Qi Wu (Shanghai Jiaotong University) · Ling Pei (Shanghai Jiao Tong Univeristy)
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Junyi Ma (Shanghai Jiao Tong University) · Xieyuanli Chen (National University of Defense Technology) · Jiawei Huang (HAOMO Technology Co., Ltd) · Jingyi Xu (Beijing Institute of Technology) · Zhen Luo (Beijing Institute of Technology) · Jintao Xu (Xi'an Jiaotong University) · Weihao Gu (Tsinghua University, Tsinghua University) · Rui Ai (HAOMO.AI Technology Co.,Ltd. ) · Hesheng Wang (Shanghai Jiao Tong University)
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Li Hu (Alibaba)
Effective Video Mirror Detection with Inconsistent Motion Cues
Alex Warren (Swansea University) · Ke Xu (City University of Hong Kong) · Jiaying Lin (City University of Hong Kong) · Gary Tam (Swansea University) · Rynson W.H. Lau (City University of Hong Kong)
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Kelvin C.K. Chan (Google DeepMind) · Yang Zhao (Google) · Xuhui Jia (Google) · Ming-Hsuan Yang (University of California at Merced) · Huisheng Wang (Google)
Spatio-Temporal Turbulence Mitigation: A Translational Perspective
Xingguang Zhang (Purdue University) · Nicholas M Chimitt (Purdue University) · Yiheng Chi (Purdue University) · Zhiyuan Mao (Samsung Research America) · Stanley H. Chan (Purdue University, USA)
Super-Resolution Reconstruction from Bayer-Pattern Spike Streams
Yanchen Dong (Peking University) · Ruiqin Xiong (Peking University) · Jian Zhang (Peking University) · Zhaofei Yu (Peking University) · Xiaopeng Fan (Harbin Institute of Technology) · Shuyuan Zhu (University of Electronic Science and Technology of China) · Tiejun Huang (Peking University)
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham (University of Maryland, College Park) · Chuong Huynh (University of Maryland, College Park) · Ser-Nam Lim (Meta AI) · Abhinav Shrivastava (University of Maryland)
Looking 3D: Anomaly Detection with 2D-3D Alignment
Ankan Kumar Bhunia (The University of Edinburgh) · Changjian Li (University of Edinburgh) · Hakan Bilen (University of Edinburgh)
DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement
Jiuming Liu (Shanghai Jiao Tong University) · Guangming Wang (University of Cambridge) · Weicai Ye (Zhejiang University) · Chaokang Jiang () · Jinru Han (Shanghai Jiao Tong University) · Zhe Liu (Shanghai Jiaotong University) · Guofeng Zhang (Zhejiang University) · Dalong Du (PhiGent Robotics) · Hesheng Wang (Shanghai Jiao Tong University)
Shadows Don’t Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Ayush Sarkar (Department of Computer Science at University of Illinois Urbana-Champaign) · Hanlin Mai (University of Illinois Urbana Champaign) · Amitabh Mahapatra (University of Illinois Urbana-Champaign) · David Forsyth (University of Illinois at Urbana-Champaign) · Svetlana Lazebnik (University of Illinois at Urbana-Champaign) · Anand Bhattad (None)
Point Cloud Pre-training with Diffusion Models
xiao zheng (None) · Xiaoshui Huang (Shanghai AI Laboratory) · Guofeng Mei (Fondazione Bruno Kessler) · Zhaoyang Lyu (Shanghai AI Laboratory) · Yuenan Hou (Shanghai AI Laboratory) · Wanli Ouyang (University of Sydney) · Bo Dai (Shanghai AI Laboratory) · Yongshun Gong (Shandong University)
Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
David Stotko (University of Bonn) · Nils Wandel (University of Bonn) · Reinhard Klein (University of Bonn)
On Train-Test Class Overlap and Detection for Image Retrieval
Chull Hwan Song (Dealicious Inc) · Jooyoung Yoon (Dealicious Inc) · Taebaek Hwang (None) · Shunghyun Choi (Dealicious Inc.) · Yeong Hyeon Gu (Sejong University) · Yannis Avrithis (IARAI)
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
Zihua Liu (Tokyo Institute of Technology) · Hiroki Sakuma (T2 Inc.) · Masatoshi Okutomi (Tokyo Institute of Technology)
Permutation Equivariance of Transformers and Its Applications
Hengyuan Xu (Shanghai Jiao Tong University) · Liyao Xiang (Shanghai Jiao Tong University) · Hangyu Ye (Shanghai Jiaotong University) · Dixi Yao (University of Toronto) · Pengzhi Chu (Shanghai Jiaotong University) · Baochun Li (University of Toronto)
Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning
Nirat Saini (University of Maryland College Park) · Khoi Pham (University of Maryland, College Park) · Abhinav Shrivastava (University of Maryland)
Transductive Zero-Shot $\&$ Few-Shot CLIP
Ségolène Martin (TU Berlin) · Yunshi HUANG (École de technologie supérieure, Université du Québec) · Fereshteh Shakeri (École de technologie supérieure) · Jean-Christophe Pesquet (CentraleSupelec) · Ismail Ben Ayed (ETS Montreal)
SLICE: Stabilized LIME for Consistent Explanations for Image Classification
Revoti Prasad Bora (Norwegian University of Science and Technology) · Kiran Raja (Norwegian University of Science and Technology) · Philipp Terhörst (Paderborn University, Germany) · Raymond Veldhuis (University of Twente) · Raghavendra Ramachandra (Norwegian University of Science and Technology (NTNU))
GenZI: Zero-Shot 3D Human-Scene Interaction Generation
Lei Li (Technical University of Munich) · Angela Dai ()
Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking
Cheng-Yao Hong (Academia Sinica) · Yen-Chi Hsu (Department of computer science and informational engineering, National Taiwan University) · Tyng-Luh Liu (IIS/Academia Sinica)
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang (None) · Liangbin Xie (Macau) · Xintao Wang (Tencent) · Ziyang Yuan (Tsinghua University, Tsinghua University) · Xiaodong Cun (Tencent AI Lab) · Yixiao Ge (Tencent) · Jiantao Zhou (University of Macau) · Chao Dong (SIAT) · Rui Huang (The Chinese University of Hong Kong, Shenzhen) · Ruimao Zhang (The Chinese University of Hong Kong (Shenzhen)) · Ying Shan (Tencent)
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Minghao Chen (University of Oxford) · Junyu Xie (University of Oxford) · Iro Laina (University of Oxford) · Andrea Vedaldi (University of Oxford)
3D Neural Edge Reconstruction
Lei Li (ETH Zurich) · Songyou Peng (ETH Zurich & MPI Tübingen) · Zehao Yu (None) · Shaohui Liu (ETH Zurich) · Rémi Pautrat (Microsoft Mixed Reality & AI lab) · Xiaochuan Yin (Utopilot) · Marc Pollefeys (ETH Zurich / Microsoft)
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions
Namitha Padmanabhan (University of Maryland) · Matthew A Gwilliam (University of Maryland, College Park) · Pulkit Kumar (None) · Shishira R Maiya (University of Maryland) · Max Ehrlich (University of Maryland, College Park) · Abhinav Shrivastava (University of Maryland)
Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding
Zhiheng Cheng (East China Normal University) · Qingyue Wei (Stanford University) · Hongru Zhu (None) · Yan Wang (East China Normal University) · Liangqiong Qu (The University of Hong Kong) · Wei Shao (University of Florida) · Yuyin Zhou (UC Santa Cruz)
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao (Rutgers University, New Brunswick) · Samuel Schulter (NEC Laboratories America) · Long Zhao (Google DeepMind) · Zhixing Zhang (Rutgers University) · Vijay Kumar BG (NEC Laboratories America) · Yumin Suh (NEC Labs America) · Manmohan Chandraker (UC San Diego) · Dimitris N. Metaxas (Rutgers)
Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes
Diandian Guo (Universität Stuttgart) · Deng-Ping Fan (ETH Zurich) · Tongyu Lu (ETHZ - ETH Zurich) · Christos Sakaridis (ETH Zurich) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.)
DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars
Tobias Kirschstein (Department of Informatics, Technische Universität München) · Simon Giebenhain (Technische Universität München) · Matthias Nießner (Technical University of Munich)
Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection
Jiaming Li (Baidu) · Jiacheng Zhang (SUN YAT-SEN UNIVERSITY) · Jichang Li (The University of Hong Kong) · Ge Li (Peking University Shenzhen Graduate School) · Si Liu (Beihang University) · Liang Lin (Sun Yat-sen University) · Guanbin Li (Sun Yat-sen University)
Language-driven Grasp Detection
An Dinh Vuong (FPT Software - AI Center) · Minh Nhat VU (ACIN Institute, TU Wien/ Austrian Institute of Technology) · Baoru Huang (University College London, University of London) · Nghia Nguyen (FPT Software) · Hieu Le (FPT Software AI Center) · Thieu Vo (Ton Duc Thang University) · Anh Nguyen (University of Liverpool)
Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation
Zhiwei Yang (Fudan university) · Kexue Fu (Qilu University of Technology (Shandong Academy of Sciences)) · Minghong Duan (Fudan University) · Linhao Qu (Fudan University) · Shuo Wang (Fudan University) · Zhijian Song (Fudan University)
Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation
Ziyang Chen (Northwestern Polytechnical University) · Yongsheng Pan (ShanghaiTech University) · Yiwen Ye (Northwestern Polytechnical University) · Mengkang Lu (nwpu) · Yong Xia (Northwestern Polytechnical University)
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
Zhijing Shao (The Hong Kong University of Science and Technology (Guangzhou)) · Wang Zhaolong (Tsinghua University) · Zhuang Li (Prometheus Vision Technology Co., Ltd.) · Duotun Wang (The Hong Kong University of Science and Technology (Guangzhou)) · Xiangru Lin () · Yu Zhang (Prometheus Vision Technology Co., Ltd.) · Mingming Fan (Hong Kong University of Science and Technology) · Zeyu Wang (The Hong Kong University of Science and Technology (Guangzhou))
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
Dongkai Wang (Peking University) · shiyu xuan (Peking University) · Shiliang Zhang (Peking University)
Instance-based Max-margin for Practical Few-shot Recognition
Minghao Fu (None) · Ke Zhu (Nanjing University)
GARField: Group Anything with Radiance Fields
Chung Min Kim (University of California, Berkeley) · Mingxuan Wu (None) · Justin Kerr (University of California Berkeley) · Ken Goldberg (University of California Berkeley) · Matthew Tancik (Luma AI) · Angjoo Kanazawa (UC Berkeley)
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Walid Bousselham (Johann Wolfgang Goethe Universität Frankfurt am Main) · Felix Petersen (Stanford University) · Vittorio Ferrari (Synthesia) · Hilde Kuehne (University of Bonn MIT-IBM Watson AI Lab)
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Liangxiao Hu (Harbin Institute of Technology) · Hongwen Zhang (Beijing Normal University) · Yuxiang Zhang (Tsinghua University, Tsinghua University) · Boyao ZHOU (Tsinghua University) · Boning Liu (Department of Automation, Tsinghua University) · Shengping Zhang (Harbin Institute of Technology) · Liqiang Nie (Harbin Institute of Technology (Shenzhen))
ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation
Yan Di (Technische Universität München) · Chenyangguang Zhang (Tsinghua University) · Chaowei Wang (Northwestern Polytechnical University, Northwest Polytechnical University Xi'an) · Ruida Zhang (Department of Automation, Tsinghua University, Tsinghua University) · Guangyao Zhai (Technical University of Munich) · Yanyan Li (Technical University Munich) · Bowen Fu (Technische Universität München) · Xiangyang Ji (Tsinghua University) · Shan Gao (Northwest Polytechnical University Xi'an)
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Rui Li (None) · Tobias Fischer (ETH Zurich) · Mattia Segu (ETH Zurich - Swiss Federal Institute of Technology) · Marc Pollefeys (ETH Zurich / Microsoft) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Federico Tombari (Google, TUM)
SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction
Yuan Li (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Zhihao Liu (The University of Tokyo) · Bedrich Benes (Purdue University) · Xiaopeng Zhang (Institute of Automation, Chinese Academy of Sciences) · Jianwei Guo (Institute of Automation, Chinese Academy of Sciences)
Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair
Jeonghoon Park (Korea Advanced Institute of Science and Technology) · Chaeyeon Chung (Korea Advanced Institute of Science and Technology) · Jaegul Choo (Korea Advanced Institute of Science and Technology)
Prompting Vision Foundation Models for Pathology Image Analysis
CHONG YIN (Hong Kong Baptist University) · Siqi Liu (Shenzhen Research Institute of Big Data) · Kaiyang Zhou (Hong Kong Baptist University) · Vincent Wong (The Chinese University of Hong Kong) · Pong C. Yuen (Hong Kong Baptist Unviersity)
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer
Yuwen Tan (Huazhong University of Science and Technology) · Qinhao Zhou (Huazhong University of Science and Technology) · Xiang Xiang (Huazhong University of Science and Technology) · Ke Wang (Alibaba Group) · Yuchuan Wu (Alibaba Group) · Yongbin Li (Alibaba Group)
Error Detection in Egocentric Procedural Task Videos
Shih-Po Lee (Northeastern University) · Zijia Lu (Northeastern University) · Zekun Zhang (Stony Brook University) · Minh Hoai (State University of New York, Stony Brook) · Ehsan Elhamifar (None)
Robust Self-calibration of Focal Lengths from the Fundamental Matrix
Viktor Kocur (Comenius University in Bratislava) · Daniel Kyselica (Comenius University in Bratislava) · Zuzana Kukelova (Czech Technical University in Prague)
Learning to Control Camera Exposure via Reinforcement Learning
Kyunghyun Lee (LG AI Research) · Ukcheol Shin (Carnegie Mellon University (CMU)) · Byeong-Uk Lee (KRAFTON, Inc.)
Regressor-Segmenter Mutual Prompt Learning for Crowd Counting
Mingyue Guo (University of Chinese Academy of Sciences) · Li Yuan (Peking University) · Zhaoyi Yan (PengCheng Laboratory) · Binghui Chen (Alibaba Group) · Yaowei Wang (Pengcheng Laboratory) · Qixiang Ye (University of Chinese Academy of Sciences)
Efficient Detection of Long Consistent Cycles and its Application to Distributed Synchronization
Shaohan Li (University of Minnesota, Minneapolis) · Yunpeng Shi (University of California, Davis) · Gilad Lerman (University of Minnesota, Minneapolis)
Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
Dongliang Cao (University of Bonn) · Marvin Eisenberger (Technical University Munich) · Nafie El Amrani (University of Bonn) · Daniel Cremers (Technical University Munich) · Florian Bernard (University of Bonn)
Efficient Privacy-Preserving Visual Localization Using 3D Ray Clouds
Heejoon Moon (HANYANG university) · Chunghwan Lee (Hanyang University) · Je Hyeong Hong (Hanyang University)
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
Chen Cheng (Nanyang Technological University) · Xiaofeng Yang (Nanyang Technological University) · Fan Yang (None) · Chengzeng Feng (Nanyang Technological University) · ZHOUJIE FU (Nanyang Technological University) · Chuan-Sheng Foo (Centre for Frontier AI Research, A*STAR) · Guosheng Lin (Nanyang Technological University) · Fayao Liu (Institute for Infocomm Research, A*STAR)
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. MOK (Alibaba DAMO Academy) · Zi Li (Alibaba DAMO Academy) · Yunhao Bai () · Jianpeng Zhang (None) · Wei Liu (Alibaba Group) · Yan-Jie Zhou (DAMO Academy, Alibaba Group) · Ke Yan (Alibaba DAMO Academy) · Dakai Jin (Alibaba Group) · Yu Shi (China Medical University Shenyang) · Xiaoli Yin (China Medical University Shenyang) · Le Lu (Alibaba Group) · Ling Zhang (Alibaba Group)
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Xi Liu (University of Electronic Science and Technology of China) · Ying Guo (Meituan) · Cheng Zhen (Meituan) · Tong Li (Meituan) · Yingying Ao (Meituan) · Pengfei Yan (Meituan)
Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI
Sean I. Young (Harvard Medical School / MIT) · Yaël Balbastre (Massachusetts General Hospital, Harvard Medical School) · Bruce Fischl (Massachusetts General Hospital, Harvard University) · Polina Golland (Massachusetts Institute of Technology) · Juan Iglesias (Harvard University)
View From Above: Orthogonal viewpoint aware Cross-view Localization
Shan Wang (ANU;CSIRO) · Chuong Nguyen (None) · Jiawei Liu (Australian National University) · Yanhao Zhang (University of Technology Sydney) · Sundaram Muthu (, CSIRO) · Fahira Afzal Maken (CSIRO) · Kaihao Zhang (Australian National University) · Hongdong Li (Australian National University)
Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo
Zongrui Li (Nanyang Technological University) · Zhan Lu (Nanyang Technological University) · Haojie Yan (Zhejiang University) · Boxin Shi (Peking University) · Gang Pan (Zhejiang University) · Qian Zheng (Zhejiang University) · Xudong Jiang (Nanyang Technological University)
SEAS: ShapE-Aligned Supervision for Person Re-Identification
Haidong Zhu (University of Southern California) · Pranav Budhwant (University of Southern California) · Zhaoheng Zheng (University of Southern California) · Ram Nevatia (None)
LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes
shanlin sun (University of California, Irvine) · Bingbing Zhuang (NEC Labs America) · Ziyu Jiang (Texas A&M) · Buyu Liu (NEC-Labs) · Xiaohui Xie (University of California, Irvine) · Manmohan Chandraker (UC San Diego)
LoS: Local Structure Guided Stereo Matching
Kunhong Li (SUN YAT-SEN UNIVERSITY) · Longguang Wang (National University of Defense Technology) · Ye Zhang (SUN YAT-SEN UNIVERSITY) · Kaiwen Xue (Huawei Cloud Computing Technologies Co., Ltd.) · Shunbo Zhou (Huawei Technologies Ltd.) · Yulan Guo (SUN YAT-SEN UNIVERSITY)
AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
Jonas Ricker (Ruhr University Bochum) · Denis Lukovnikov (Ruhr University Bochum) · Asja Fischer (Ruhr-Universität Bochum)
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
Xiaoyang Wu (The University of Hong Kong) · Zhuotao Tian (The Chinese University of Hong Kong) · Xin Wen (The University of Hong Kong) · Bohao Peng (The Chinese University of Hong Kong) · Xihui Liu (The University of Hong Kong) · Kaicheng Yu (Alibaba Group) · Hengshuang Zhao (The University of Hong Kong)
UniGS: Unified Representation for Image Generation and Segmentation
Lu Qi (University of California, Merced) · Lehan Yang (University of Sydney) · Weidong Guo (Tencent) · Yu Xu (University of Waterloo) · Bo Du (Wuhan University) · Varun Jampani (Google Research) · Ming-Hsuan Yang (University of California at Merced)
Meta-Point Learning and Refining for Category-Agnostic Pose Estimation
Junjie Chen (Jiangxi University of Finance and Economics) · Jiebin Yan (Jiangxi University of Finance and Economics) · Yuming Fang (Jiangxi University of Finance and Economics) · Li Niu ()
A Unified Framework for Human-centric Point Cloud Video Understanding
Yiteng Xu () · Kecheng Ye (None) · xiao han (ShanghaiTech University) · yiming ren (None) · Xinge Zhu (The Chinese University of Hong Kong) · Yuexin Ma (ShanghaiTech University)
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
Ce Zhang (Carnegie Mellon University) · Simon Stepputtis (Carnegie Mellon University) · Joseph Campbell (Carnegie Mellon University) · Katia Sycara (Carnegie Mellon University) · Yaqi Xie (CMU)
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation
Zifan Wang (Tsinghua University) · Junyu Chen (Tsinghua University, Tsinghua University) · Ziqing Chen (Tsinghua University, Tsinghua University) · Pengwei Xie (Electronic Engineering, Tsinghua University, Tsinghua University) · Rui Chen (Tsinghua University, Tsinghua University) · Li Yi ()
Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth
Zhaoyang Sun (Wuhan University of Technology) · Shengwu Xiong (Wuhan University of Technology) · Yaxiong Chen (Wuhan University of Technology) · Yi Rong (Wuhan University of Technology)
GenTron: Diffusion Transformers for Image and Video Generation
Shoufa Chen (The University of Hong Kong) · Mengmeng Xu (Meta AI) · Jiawei Ren (Nanyang Technological University) · Yuren Cong (Institute of Information Processing, Leibniz University Hanover) · Sen He (Meta AI) · Yanping Xie (Meta) · Animesh Sinha (Meta AI) · Ping Luo (The University of Hong Kong) · Tao Xiang (University of Surrey) · Juan-Manuel Pérez-Rúa (Meta AI)
Misalignment-Robust Frequency Distribution Loss for Image Transformation
Zhangkai Ni (Tongji University) · Juncheng Wu (Tongji University) · Zian Wang (Tongji University) · Wenhan Yang (Peng Cheng Lab) · Hanli Wang (Tongji University) · Lin Ma (Meituan)
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen (University of Texas at Austin) · Kumar Ashutosh (UT Austin & FAIR, Meta) · Rohit Girdhar (Meta) · David Harwath (University of Texas, Austin) · Kristen Grauman (University of Texas at Austin)
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling
Liwen Wu (Computer Science and Engineering Department, University of California, San Diego) · Sai Bi (Adobe Systems) · Zexiang Xu (Adobe Research) · Fujun Luan (Adobe Systems) · Kai Zhang (Adobe Systems) · Iliyan Georgiev (Adobe) · Kalyan Sunkavalli (Adobe Research) · Ravi Ramamoorthi (None)
360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Huajian Huang (The Hong Kong University of Science and Technology) · Changkun Liu (Hong Kong University of Science and Technology) · Yipeng Zhu (Hong Kong University of Science and Technology) · Hui Cheng (SUN YAT-SEN UNIVERSITY) · Tristan Braud (Hong Kong University of Science and Technology) · Sai-Kit Yeung (The Hong Kong University of Science and Technology (HKUST))
LIVE: Online Large Video-Language Model for Streaming Video
Joya Chen (National University of Singapore) · Zhaoyang Lv (None) · Shiwei Wu (University of Science and Technology of China) · Kevin Qinghong Lin (national university of singaore, National University of Singapore) · Chenan Song (national university of singaore, National University of Singapore) · Difei Gao (None) · Jia-Wei Liu (National University of Singapore) · Ziteng Gao (National University of Singapore) · Dongxing Mao (SUTD) · Mike Zheng Shou (National University of Singapore)
Physical Property Understanding from Language-Embedded Feature Fields
Albert J. Zhai (University of Illinois at Urbana-Champaign) · Yuan Shen (University of Illinois at Urbana-Champaign) · Emily Y. Chen (University of Illinois Urbana Champaign) · Gloria Wang (Department of Computer Science) · Xinlei Wang (University of Illinois Urbana-Champaign) · Sheng Wang (University of Illinois Urbana-Champaign) · Kaiyu Guan (University of Illinois, Urbana Champaign) · Shenlong Wang (University of Illinois, Urbana Champaign)
Task-Customized Mixture of Adapters for General Image Fusion
Pengfei Zhu (Tianjin University) · Yang Sun (Tianjin University) · Bing Cao (Tianjin University) · Qinghua Hu (Tianjin University)
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
Zhikai Chen (None) · Fuchen Long (JD.com) · Zhaofan Qiu (University of Science and Technology of China) · Ting Yao (JD AI Research) · Wengang Zhou (University of Science and Technology of China) · Jiebo Luo (University of Rochester) · Tao Mei (JD Explore Academy)
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Phillip Howard (Intel Labs) · Avinash Madasu (None) · Tiep Le (Intel) · Gustavo Lujan-Moreno (Intel) · Anahita Bhiwandiwalla (Intel) · Vasudev Lal (None)
Convolutional Prompting meets Language Models for Continual Learning
ANURAG Roy (IIT Kharagpur) · Riddhiman Moulick (Indian Institute of Technology Kharagpur) · Vinay Verma Verma (None) · Saptarshi Ghosh (Indian Institute of Technology Kharagpur) · Abir Das (Indian Institute of Technology Kharagpur)
Multiview Aerial Visual RECognition (MAVREC) Dataset: Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta (University of Central Florida) · Srijan Das (University of North Carolina at Charlotte) · Jacob Nielsen (University of Southern Denmark - SDU) · RAJATSUBHRA CHAKRABORTY (University of North Carolina at Charlotte) · Mubarak Shah (University of Central Florida)
Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering
Vivek Gopalakrishnan (MIT) · Neel Dey (Massachusetts Institute of Technology) · Polina Golland (Massachusetts Institute of Technology)
JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients
Woo Kyoung Han (Korea University) · Sunghoon Im (DGIST) · Jaedeok Kim (NVIDIA) · Kyong Hwan Jin (Korea University)
Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning
Tung Le (University of California, Irvine) · Khai Nguyen (UT Austin) · shanlin sun (University of California, Irvine) · Nhat Ho (University of Texas, Austin) · Xiaohui Xie (University of California, Irvine)
Generative Powers of Ten
Xiaojuan Wang (Department of Computer Science) · Janne Kontkanen (Research, Google) · Brian Curless (University of Washington) · Steve Seitz (University of Washington) · Ira Kemelmacher-Shlizerman (UW + Google) · Ben Mildenhall (Google) · Pratul P. Srinivasan (Google Research) · Dor Verbin (None) · Aleksander Holynski (UC Berkeley & Google Research)
SuperPrimitive: Scene Reconstruction at a Primitive Level
Kirill Mazur (Imperial College London) · Gwangbin Bae (Imperial College London) · Andrew J. Davison (Imperial College London)
Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
Renshuai Liu (Xiamen University) · Bowen Ma (NetEase, Inc.) · Wei Zhang (None) · Zhipeng Hu (Leihuo Game, NetEase) · Changjie Fan (Netease, Fuxi AI Lab) · Tangjie Lv (NetEase, Inc.) · Yu Ding (Fuxi AI Lab in Netease) · Xuan Cheng (Xiamen University)
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri (University of Alberta) · Tianjun Feng (University of Alberta) · Cor-Paul Bezemer (University of Alberta) · Anh Nguyen (Auburn University)
LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
Kibum Kim (Korea Advanced Institute of Science and Technology) · Kanghoon Yoon (Korea Advanced Institute of Science & Technology) · Jaehyeong Jeon (Korea Advanced Institute of Science and Technology) · Yeonjun In (Korea Advanced Institute of Science & Technology) · Jinyoung Moon (ETRI) · Donghyun Kim (Korea University) · Chanyoung Park (Korea Advanced Institute of Science and Technology)
Geometrically-informed aggregation for zero-shot point cloud understanding
Guofeng Mei (Fondazione Bruno Kessler) · Luigi Riz (Fondazione Bruno Kessler) · Yiming Wang (Fondazione Bruno Kessler) · Fabio Poiesi (Fondazione Bruno Kessler)
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Yonglu Li (Shanghai Jiaotong University) · Xiaoqian Wu (None) · Xinpeng Liu (Shanghai Jiao Tong University) · Zehao Wang (Shanghai Jiao Tong University) · Yiming Dou (University of Michigan - Ann Arbor) · Yikun Ji (Shanghai Jiaotong University) · Junyi Zhang (Shanghai Jiao Tong University) · Yixing Li (Shanghai Jiao Tong University) · Xudong LU (The Chinese University of Hong Kong) · Jingru Tan (Central South University) · Cewu Lu (Shanghai Jiao Tong University)
Learning Degradation-unaware Representation with Prior-based Latent Transformations for Blind Face Restoration
Lianxin Xie (South China University of Technology) · csbingbing zheng (South China University of Technology) · Wen Xue (South China University of Technology) · Le Jiang (South China University of Technology) · Cheng Liu (Shantou University) · Si Wu (South China University of Technology) · Hau San Wong (City University of Hong Kong)
ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning
Yuxiang Zhang (Tsinghua University, Tsinghua University) · Hongwen Zhang (Beijing Normal University) · Liangxiao Hu (Harbin Institute of Technology) · Jiajun Zhang (Beijing University of Posts and Telecommunications) · Hongwei Yi (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Shengping Zhang (Harbin Institute of Technology) · Yebin Liu (Tsinghua University)
ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection
Yichen Bai (None) · Zongbo Han (Tianjin University) · Bing Cao (Tianjin University) · Xiaoheng Jiang (Zhengzhou University) · Qinghua Hu (Tianjin University) · Changqing Zhang (Tianjin University)
Unsupervised 3D Structure Inference from Category-Specific Image Collections
Weikang Wang (Rheinische Friedrich-Wilhelms Universität Bonn) · Dongliang Cao (University of Bonn) · Florian Bernard (University of Bonn)
Identifying Important Group of Pixels using Interactions
Kosuke Sumiyasu (Chiba University) · Kazuhiko Kawamoto (Chiba University) · Hiroshi Kera (Chiba University)
Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields
Joshua Ahn (University of Chicago) · Haochen Wang (Toyota Technological Institute at Chicago) · Raymond A. Yeh (Purdue University) · Greg Shakhnarovich (Toyota Technological Institute at Chicago)
POPDG: Popular 3D Dance Generation with PopDanceSet
ZhenYe Luo (Beijing Normal University) · Min Ren (Beijing Normal University) · Xuecai Hu (Beijing Normal University) · Yongzhen Huang (Beijing Normal University) · Li Yao (Beijing Normal University)
Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation
Qiyuan Dai (ShanghaiTech University) · Sibei Yang (None)
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
Zeyuan Yang (, Tsinghua University) · LIU JIAGENG (None) · Peihao Chen (South China University of Technology) · Anoop Cherian (Mitsubishi Electric Research Labs (MERL)) · Tim Marks (None) · Jonathan Le Roux (Mitsubishi Electric Research Labs) · Chuang Gan (MIT-IBM Watson AI Lab)
Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance
Junkai Fan (Nanjing University of Science and Technology) · Jiangwei Weng (Nanjing University of Science and Technology) · Kun Wang (Nanjing University of Science and Technology) · Yijun Yang (None) · Jianjun Qian (Nanjing University of Science and Techonology) · Jun Li (Nanjing University of Science and Technology) · Jian Yang (Nanjing University of Science and Technology)
SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation
Jiaben Chen (University of California, San Diego) · Huaizu Jiang (Northeastern University)
Sharingan: A Transformer Architecture for Multi-Person Gaze Following
Samy Tafasca (EPFL) · Anshul Gupta (None) · Jean-marc Odobez (Swiss Federal Institute of Technology Lausanne)
Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection
Heng Zhang (Gaoling School of Artificial Intelligence, Renmin University of China) · Qiuyu Zhao (JD) · Linyu Zheng (JD) · Hao Zeng (JD.com) · Zhiwei Ge (JD) · Tianhao Li (JD) · Sulong Xu (JD)
Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing
Bi'an Du (None) · Xiang Gao (Peking University) · Wei Hu (None) · Renjie Liao (University of British Columbia)
Implicit Motion Function
Yue Gao (Microsoft Research) · Jiahao Li (Microsoft Research Asia) · Lei Chu (Microsoft Research Asia) · Yan Lu (Microsoft Research Asia)
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation
Sumanth Udupa (Indian Institute of Science) · Prajwal Gurunath (Indian Institute of Science) · Aniruddh Sikdar (Indian Institute of Science) · Suresh Sundaram (Indian Institute of Science, Indian institute of science, Bangalore)
ICP-Flow: LiDAR Scene Flow Estimation with ICP
Yancong Lin (Delft University of Technology) · Zimin Xia (Motional)
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images
Guanlin Shen (Tsinghua University) · Jingwei Huang (Huawei Technologies Ltd.) · Zhihua Hu (Nanjing University of Information Science and Technology) · Bin Wang (Tsinghua University)
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Yining Hong () · Zishuo Zheng (None) · Peihao Chen (South China University of Technology) · Yian Wang (Department of Computer Science, University of Massachusetts at Amherst) · Junyan Li (Zhejiang University) · Chuang Gan (MIT-IBM Watson AI Lab)
Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation
Xin Kang () · Lei Chu (Microsoft Research Asia) · Jiahao Li (Microsoft Research Asia) · Xuejin Chen (University of Science and Technology of China) · Yan Lu (Microsoft Research Asia)
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Yicheng Xiao (Tsinghua University, Tsinghua University) · Zhuoyan Luo (Tsinghua University) · Yong Liu (None) · Yue Ma (Tsinghua University, Tsinghua University) · Hengwei Bian (Carnegie Mellon University) · Yatai Ji (None) · Yujiu Yang (Tsinghua University) · Xiu Li (Tsinghua University)
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
Nicolas Bourriez (Ecole Normale Supérieure de Paris) · Ihab Bendidi (Ecole Normale Superieure) · Cohen Ethan (Ecole Normale Supérieure de Paris) · Gabriel Watkinson (Ecole Normale Supérieure de Paris) · Maxime Sanchez (IBENS) · Guillaume Bollot (Synsight company) · Auguste Genovesio (Ecole Normale Supérieure de Paris)
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Sherwin Bahmani (None) · Ivan Skorokhodov (KAUST) · Victor Rong (University of Toronto) · Gordon Wetzstein (Stanford University) · Leonidas Guibas (Stanford University) · Peter Wonka (KAUST) · Sergey Tulyakov (Snap Inc.) · Jeong Joon Park (Stanford University) · Andrea Tagliasacchi (Simon Fraser University, Google Brain) · David B. Lindell (University of Toronto)
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Ji-Jia Wu (National Taiwan University) · Andy Chia-Hao Chang (National Yang Ming Chiao Tung University) · Chieh-Yu Chuang (National Yang Ming Chiao Tung University) · Chun-Pei Chen (National Yang Ming Chiao Tung University) · Yu-Lun Liu (National Yang Ming Chiao Tung University) · Min-Hung Chen (NVIDIA) · Hou-Ning Hu (MediaTek Inc.) · Yung-Yu Chuang (National Taiwan University) · Yen-Yu Lin (National Yang Ming Chiao Tung University)
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi (Korea University) · Sanghyeok Lee (Korea University) · Jaewon Chu (Korea University) · Minhyuk Choi (Korea University) · Hyunwoo J. Kim (Korea University)
Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes
Liqiong Wang (China Three Gorges University) · Jinyu Yang (University of Birmingham) · Yanfu Zhang (College of William and Mary) · Fangyi Wang (China Three Gorges University) · Feng Zheng (Southern University of Science and Technology)
Diffusion Models Without Attention
Jing Nathan Yan (Cornell University) · Jiatao Gu (Apple (MLR)) · Alexander Rush (Cornell Tech)
DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models
Khawar Islam (FloppyDisk.AI) · Muhammad Zaigham Zaheer (Mohamed bin Zayed University of Artificial Intelligence) · Arif Mahmood (Information Technology University, Lahore) · Karthik Nandakumar (Mohamed Bin Zayed University of Artificial Intelligence)
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu (Tsinghua University) · Quan Sun (BAAI) · Xiaosong Zhang (Beijing Academy of Artificial Intelligence) · Yufeng Cui (Beihang University) · Fan Zhang (Beijing Academy of Artificial Intelligence) · Yue Cao (Beijing Academy of Artificial Intelligence) · Xinlong Wang (Beijing Academy of Artificial Intelligence) · Jingjing Liu (Tsinghua University, Tsinghua University)
ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Meng-Li Shih (University of Washington) · Wei-Chiu Ma (Cornell University) · Lorenzo Boyice (Google) · Aleksander Holynski (UC Berkeley & Google Research) · Forrester Cole (Google) · Brian Curless (University of Washington) · Janne Kontkanen (Research, Google)
Real-Time Simulated Avatar from Head-Mounted Sensors
Zhengyi Luo (Carnegie Mellon University) · Jinkun Cao (Carnegie Mellon University) · Rawal Khirodkar (Meta) · Alexander Winkler (Meta) · Jing Huang (Facebook) · Kris Kitani (Carnegie Mellon University) · Weipeng Xu (Meta Reality Labs Research)
Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
Hoang Chuong Nguyen (Australian National University) · Tianyu Wang (Australian National University) · Jose M. Alvarez (NVIDIA) · Miaomiao Liu (Australian National University)
Breathing Life Into Sketches Using Text-to-Video Priors
Rinon Gal (Tel Aviv University, NVIDIA) · Yael Vinker (Tel Aviv University) · Yuval Alaluf (Tel Aviv University) · Amit H. Bermano (Tel Aviv University, Technion) · Daniel Cohen-Or (Google) · Ariel Shamir (Reichman University) · Gal Chechik (NVIDIA, Bar-Ilan University)
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
Haonan Wang (The Hong Kong University of Science and Technology) · Qixiang ZHANG (Hong Kong University of Science and Technology) · Yi Li (Hong Kong University of Science and Technology) · Xiaomeng Li (The Hong Kong University of Science and Technology)
Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping
Peng Sun (Hunan University) · Xinyang Liu (Hong Kong Polytechnic University) · Zhibo Wang (Zhejiang University) · Bo Liu (Shenzhen Institute of Artificial Intelligence and Robotics for Society)
Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling
Kranthi Kumar Rachavarapu (Indian Institute of Technology Madras) · Kalyan Ramakrishnan (University of Oxford) · A. N. Rajagopalan (Indian Institute of Technology Madras)
State Space Models for Event Cameras
Nikola Zubic (Robotics and Perception Group, University of Zurich and ETH Zurich) · Mathias Gehrig (University of Zurich) · Davide Scaramuzza (University of Zurich)
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk (None) · Jaesung Huh (University of Oxford) · Evangelos Kazakos (Czech Technical University of Prague) · Andrew Zisserman (University of Oxford) · Dima Damen (University of Bristol and Google DeepMind)
READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning
Takeru Oba (Toyota Technological Institute) · Matthew Walter (Toyota Technological Institute at Chicago) · Norimichi Ukita (Toyota Technological Institute)
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li (SIST ,ShanghaiTech University) · Songyang Zhang (Shanghai AI Laboratory) · Dahua Lin (The Chinese University of Hong Kong) · Kai Chen (Shanghai AI Laboratory) · Xuming He (ShanghaiTech University)
Spectrum AUC Difference (SAUCD): Human Aligned 3D Shape Evaluation
Tianyu Luan (State University of New York at Buffalo) · Zhong Li (InnoPeak Technology) · Lele Chen (Sony America) · Xuan Gong (Harvard University) · Lichang Chen (Department of Computer Science, University of Maryland, College Park) · Yi Xu (OPPO US Research Center) · Junsong Yuan (State University of New York at Buffalo)
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick (None) · Guangxing Han (Columbia University) · Rui Hou (Meta Inc. ) · Sayan Nag (University of Toronto) · Ser-Nam Lim (Meta AI) · Nicolas Ballas (Facebook) · Qifan Wang (Meta AI) · Rama Chellappa (Johns Hopkins University) · Amjad Almahairi (Facebook)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury (None) · Sayan Nag (University of Toronto) · Joseph K J (Adobe Research) · Balaji Vasan Srinivasan (Adobe Research) · Dinesh Manocha (University of Maryland, College Park)
Multi-Space Alignments Towards Universal LiDAR Segmentation
Youquan Liu (Hochschule Bremerhaven) · Lingdong Kong (National University of Singapore) · Xiaoyang Wu (The University of Hong Kong) · Runnan Chen (None) · Xin Li (East China Normal University) · Liang Pan (Shanghai AI Lab) · Ziwei Liu (Nanyang Technological University) · Yuexin Ma (ShanghaiTech University)
DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions
Yunxiao Shi (Qualcomm AI Research) · Manish Singh (Qualcomm AI Research) · Hong Cai (Qualcomm AI Research) · Fatih Porikli (QualComm)
Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning
Dipam Goswami (Computer Vision Center) · Albin Soutif (Computer Vision Center, Universitat Autònoma de Barcelona) · Yuyang Liu (Shenyang Institute of Automation, Chinese Academy of Sciences/ University of Chinese Academy of Sciences) · Sandesh Kamath (Computer Vision Center, Universitat Autónoma de Barcelona) · Bartłomiej Twardowski (Computer Vision Center / IDEAS NCBR) · Joost van de Weijer (Computer Vision Center Barcelona)
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen (University of California, Merced) · Aliaksandr Siarohin (Snap Inc.) · Willi Menapace (University of Trento) · Ekaterina Deyneka (Snap Inc.) · Hsiang-wei Chao (Snap Inc.) · Byung Jeon (Snap Inc.) · Yuwei Fang (Snap Inc.) · Hsin-Ying Lee (Snap Inc.) · Jian Ren (Snap Inc.) · Ming-Hsuan Yang (University of California at Merced) · Sergey Tulyakov (Snap Inc.)
HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
Mengcheng Li (Tsinghua University, Tsinghua University) · Hongwen Zhang (Beijing Normal University) · Yuxiang Zhang (Tsinghua University, Tsinghua University) · Ruizhi Shao (Tsinghua University, Tsinghua University) · Tao Yu (Tsinghua University, Tsinghua University) · Yebin Liu (Tsinghua University)
FSC: Few-point Shape Completion
Xianzu Wu (Jianghan University) · Xianfeng Wu (Jianghan University) · Tianyu Luan (State University of New York at Buffalo) · Yajing Bai (Jianghan University) · Zhongyuan Lai (Jianghan University) · Junsong Yuan (State University of New York at Buffalo)
Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models
Xingqian Xu (University of Illinois, Urbana Champaign) · Jiayi Guo (Tsinghua University, Tsinghua University) · Zhangyang Wang (University of Texas at Austin) · Gao Huang (Tsinghua University, Tsinghua University) · Irfan Essa (Georgia Institute of Technology) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
Holistic Features are almost Sufficient for Text-to-Video Retrieval
Kaibin Tian (None) · Ruixiang Zhao (None) · Zijie Xin (Sichuan University) · Bangxiang Lan (Renmin University of China) · Xirong Li (Renmin University of China)
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
Yuheng Jiang (ShanghaiTech University) · Zhehao Shen (ShanghaiTech University) · Penghao Wang (None) · Zhuo Su (ByteDance) · Yu Hong (ShanghaiTech University) · Yingliang Zhang (DGene Inc.) · Jingyi Yu (ShanghaiTech University) · Lan Xu (ShanghaiTech University)
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
Yushuang Wu (The Chinese University of Hong Kong (Shenzhen)) · Luyue Shi (The Chinese University of Hong Kong, Shenzhen) · Junhao Cai (Hong Kong University of Science and Technology) · Weihao Yuan (Alibaba Group) · Lingteng Qiu (None) · Zilong Dong (Alibaba Group) · Liefeng Bo (None) · Shuguang Cui (The Chinese University of Hong Kong, Shenzhen) · Xiaoguang Han (The Chinese University of Hong Kong, Shenzhen)
Doubly Abductive Counterfactual Inference for Text-based Image Editing
Xue Song (Fudan University) · Jiequan Cui (Nanyang Technological University) · Hanwang Zhang (Nanyang Technological University) · Jingjing Chen (Fudan University) · Richang Hong (Hefei University of Technology) · Yu-Gang Jiang (Fudan University)
SD2Event: Self-supervised Learning of Dynamic Detectors and Contextual Descriptors for Event Cameras
Yuan Gao (University of Science and Technology of China) · Yuqing Zhu (University of Science and Technology of China) · Xinjun Li (University of Science and Technology of China) · Yimin Du (University of Science and Technology of China) · Tianzhu Zhang (University of Science and Technology of China)
ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks
Kai Han (Huawei Noah's Ark Lab) · Yunhe Wang (Huawei Noah's Ark Lab) · Jianyuan Guo (University of Sydney) · Enhua Wu (University of Macau)
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah (University of Maryland, College Park) · Sreyan Ghosh (University of Maryland, College Park) · Sonal Kumar (University of Maryland, College Park) · Purva Chiniya (University of Maryland, College Park) · Dinesh Manocha (University of Maryland, College Park)
E-GPS: Explainable Geometry Problem Solving via Top-Down Solver and Bottom-Up Generator
Wenjun Wu (None) · Lingling Zhang (Xi'an Jiaotong University) · Jun Liu (Xi'an Jiaotong University) · Xi Tang (Xi'an Jiaotong University) · Yaxian Wang (Xi'an Jiaotong University) · Shaowei Wang (Xi'an Jiaotong University) · QianYing Wang (lenovo group)
CAGE: Controllable Articulation GEneration
Jiayi Liu (Simon Fraser University) · Hou In Ivan Tam (Simon Fraser University) · Ali Mahdavi Amiri (Simon Fraser University) · Manolis Savva (Simon Fraser University)
Fine-grained Bipartite Concept Factorization for Clustering
Chong Peng (None) · Pengfei Zhang (Qingdao University) · Yongyong Chen (Harbin Institute of Technology (Shenzhen)) · zhao kang (University of Electronic Science and Technology of China) · Chenglizhao Chen (China University of Petroleum) · Qiang Cheng (University of Kentucky)
StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation
Sidi Wu (ETH Zurich) · Yizi Chen (ETHZ - ETH Zurich) · Loic Landrieu (ENPC, IGN) · Nicolas Gonthier (IGN) · Samuel Mermet (Ecole Nationale des Sciences Géographiques) · Lorenz Hurni (ETHZ - ETH Zurich) · Konrad Schindler (ETH Zurich)
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
Prashant Kumar (Indian Institute of Technology Delhi) · Kshitij Madhav Bhat (Indian Institute of Technology Indore) · Vedang Bhupesh Shenvi Nadkarni (Birla Institute of Technology and Science Pilani (BITS Pilani)) · Prem Kalra (Indian Institute of Technology, Delhi)
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Xin Huang (Northwest Polytechnical University Xi'an) · Ruizhi Shao (Tsinghua University, Tsinghua University) · Qi Zhang (Northwest Polytechnical University Xi'an) · Hongwen Zhang (Beijing Normal University) · Ying Feng (Northwest Polytechnical University Xi'an) · Yebin Liu (Tsinghua University) · Qing Wang (Northwestern Polytechnical University)
ESCAPE: Encoding Super-keypoints for Category-Agnostic Pose Estimation
Khoi D Nguyen (University of Wisconsin - Madison) · Chen Li (National University of Singapore) · Gim Hee Lee (National University of Singapore)
ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models
Fei Kong (University of Electronic Science and Technology of China) · Jinhao Duan (Drexel University) · Lichao Sun (Lehigh University) · Hao Cheng (Hong Kong University of Science and Technology(Guangzhou)) · Renjing Xu (Hong Kong University of Science and Technology (Guangzhou)) · Heng Tao Shen (University of Electronic Science and Technology of China) · Xiaofeng Zhu (University of Electronic Science and Technology of China) · Xiaoshuang Shi (University of Electronic Science and Technology of China) · Kaidi Xu (Drexel University)
Unbiased Estimator for Distorted Conic in Camera Calibration
Chaehyeon Song (Seoul National University) · Jaeho Shin (Seoul National University) · Myung-Hwan Jeon (Seoul National University) · Jongwoo Lim (Seoul National University) · Ayoung Kim (Seoul National University)
Accurate Training Data for Occupancy Map Prediction in Automated Driving using Evidence Theory
Jonas Kälble (Bosch Center for Artificial Intelligence) · Sascha Wirges (Robert Bosch GmbH, Bosch) · Maxim Tatarchenko (Bosch) · Eddy Ilg (None)
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Chao Xu (Zhejiang University) · Yang Liu (Alibaba Group) · Jiazheng Xing (Zhejiang University) · Weida Wang (Xingji Meizu Group) · Mingze Sun (None) · Jun Dan (Zhejiang University) · Tianxin Huang (Tencent youtu lab) · Siyuan Li (Westlake University, Zhejiang University) · Zhi-Qi Cheng (Carnegie Mellon University) · Ying Tai (Nanjing University) · Baigui Sun (Alibaba Group)
Backdoor Defense via Test-Time Detecting and Repairing
Jiyang Guan (Institute of Automation, Chinese Academy of Sciences) · Jian Liang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Ran He (None)
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
Zhixuan Liang (The University of Hong Kong) · Yao Mu (The University of Hong Kong) · Hengbo Ma (None) · Masayoshi Tomizuka (University of California, Berkeley) · Mingyu Ding (UC Berkeley) · Ping Luo (The University of Hong Kong)
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia (Baidu) · Xinliang Wang (Baidu) · Feng Lv (Baidu) · Xin Hao (Beijing Institute of Technology) · Yifeng Shi (Baidu)
ZONE: Zero-Shot Instruction-Guided Local Editing
Shanglin Li (Beijing University of Aeronautics and Astronautics) · Bohan Zeng (Beijing University of Aeronautics and Astronautics) · Yutang Feng (Beijing University of Aeronautics and Astronautics) · Sicheng Gao (Bayerische Julius-Maximilians-Universität Würzburg) · Xuhui Liu (Beihang University) · Jiaming Liu (Xiaohongshu) · Li Lin (Xiamen University) · Xu Tang (Shanghaitech University) · Yao Hu (Zhejiang University, Tsinghua University) · Jianzhuang Liu (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Baochang Zhang (Beihang University)
HIVE: Harnessing Human Feedback for Instructional Visual Editing
Shu Zhang (SalesForce.com) · Xinyi Yang (Salesforce Research) · Yihao Feng (Salesforce Research) · Can Qin (Northeastern University) · Chia-Chih Chen (Salesforce) · Ning Yu (Salesforce Research) · Zeyuan Chen (SalesForce.com) · Huan Wang (SalesForce.com) · Silvio Savarese (Salesforce) · Stefano Ermon (Stanford University) · Caiming Xiong (Salesforce Research) · Ran Xu (SalesForce.com)
Clustering for Protein Representation Learning
Ruijie Quan (Zhejiang University) · Wenguan Wang (Zhejiang University) · Fan Ma (None) · Hehe Fan (None) · Yi Yang (Zhejiang University)
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen (Karlsruhe Institute of Technology (KIT)) · Jiaming Zhang (KIT) · Kunyu Peng (KIT) · Junwei Zheng (Karlsruhe Institute of Technology) · Ruiping Liu (Karlsruher Institut für Technologie) · Philip H.S. Torr (University of Oxford) · Rainer Stiefelhagen (Karlsruhe Institute of Technology)
What Sketch Explainability Really Means for Downstream Tasks ?
Hmrishav Bandyopadhyay (University of Surrey) · Pinaki Nath Chowdhury (University of Surrey) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Aneeshan Sain (University of Surrey) · Tao Xiang (University of Surrey) · Yi-Zhe Song (None)
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran (National University of Singapore) · Xiaodong Cun (Tencent AI Lab) · Jia-Wei Liu (National University of Singapore) · Rui Zhao (None) · Song Zijie (Fudan University) · Xintao Wang (Tencent) · Jussi Keppo (National University of Singapore) · Mike Zheng Shou (National University of Singapore)
SURE: SUrvey REcipes for building reliable and robust deep networks
Yuting Li (China Three Gorges University) · Yingyi Chen (Department of Electrical Engineering, KU Leuven, Belgium, KU Leuven) · Xuanlong Yu (Université Paris-Saclay) · Dexiong Chen (Max Planck Institute of Biochemistry) · Xi Shen (Tencent AI Lab)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang (Northwestern University) · Jong-Chyi Su (None) · Samuel Schulter (NEC Laboratories America) · Sparsh Garg (NEC Laboratories America) · Shiyu Zhao (Rutgers University, New Brunswick) · Ying Wu (Northwestern University) · Manmohan Chandraker (UC San Diego)
Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Tianci Bi (Xi'an Jiaotong University) · Xiaoyi Zhang (Research, Microsoft) · Zhizheng Zhang (Microsoft Research) · Wenxuan Xie (Microsoft Research Asia) · Cuiling Lan (Microsoft) · Yan Lu (Microsoft Research Asia) · Nanning Zheng (Xi'an Jiaotong University)
Label Propagation for Zero-shot Classification with Vision-Language Models
Vladan Stojnić (Czech Technical University in Prague) · Yannis Kalantidis (NAVER LABS Europe) · Giorgos Tolias (CTU in Prague)
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
Zhipeng Du (University of Edinburgh & King's College London) · Miaojing Shi (King's College London) · Jiankang Deng (Imperial College London & Huawei UKRD)
CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images
Aaron Gokaslan (Cornell University) · A. Feder Cooper (Cornell University) · Jasmine Collins (University of California Berkeley) · Landan Seguin (Databricks) · Austin Jacobson (Databricks) · Mihir Patel (Databricks MosaicML) · Jonathan Frankle (School of Engineering and Applied Sciences, Harvard University) · Cory Stephenson (Databricks) · Volodymyr Kuleshov (Cornell University)
LEOD: Label-Efficient Object Detection for Event Cameras
Ziyi Wu (University of Toronto) · Mathias Gehrig (University of Zurich) · Qing Lyu (University of Toronto) · Xudong Liu (None) · Igor Gilitschenski (University of Toronto)
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei (Johns Hopkins University) · Mauricio Delbracio (None) · Hossein Talebi (Google Research) · Zhengzhong Tu (University of Texas at Austin) · Vishal M. Patel (Johns Hopkins University) · Peyman Milanfar (Peyman Milanfar)
Unsegment Anything by Simulating Deformation
Jiahao Lu (National University of Singapore) · Xingyi Yang (National University of Singapore) · Xinchao Wang (National University of Singapore)
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong (National University of Singapore) · Youquan Liu (Hochschule Bremerhaven) · Lai Xing Ng (Institute for Infocomm Research (I2R), A*STAR) · Benoit Cottereau (CNRS) · Wei Tsang Ooi (National University of Singapore)
$\mathcal{Z}^*$: Zero-shot $\underline{S}$tyle $\underline{T}$ransfer via $\underline{A}$ttention $\underline{R}$eweighting
Yingying Deng (None) · Xiangyu He (Meituan) · Fan Tang (Institute of Computing Technology, CAS) · Weiming Dong (Institute of Automation, Chinese Academy of Sciences)
VAREN: Very Accurate and Realistic Equine Network
Silvia Zuffi (IMATI-CNR) · Ylva Mellbin (Swedish University of Agricultural Sciences) · Ci Li (KTH Royal Institute of Technology) · Markus Höschle (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Hedvig Kjellström (KTH Royal Institute of Technology) · Senya Polikovsky (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Elin Hernlund (Swedish University of Agricultural Sciences) · Michael J. Black (University of Tübingen)
Cross-spectral Gated-RGB Stereo Depth Estimation
Samuel Brucker (Torc Robotics) · Stefanie Walz (Mercedes-Benz AG) · Mario Bijelic (Princeton University) · Felix Heide (Department of Computer Science, Princeton University)
Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
Qingping Zheng (Northwestern Polytechnical University) · Ling Zheng (Tsinghua-Fuzhou Institute for Data Technology) · Yuanfan Guo (Huawei Technologies Ltd.) · Ying Li (Northwestern Polytechnical University) · Songcen Xu (Huawei Noah's Ark Lab) · Jiankang Deng (Imperial College London & Huawei UKRD) · Hang Xu (Huawei Noah‘s Ark Lab)
LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection
Dat NGUYEN (University of Luxembourg) · Nesryne Mejri (SnT, University of Luxembourg) · Inder Pal Singh (University of Luxemburg) · Polina Kuleshova (University of Luxemburg) · Marcella Astrid (University of Luxemburg) · Anis Kacem (University of Luxemburg) · Enjie Ghorbel (CRISTAL laboratory, ENSI, University of Manouba) · Djamila Aouada (SnT, University of Luxembourg)
EASE-DETR: Easing the Competition among Object Queries
Yulu Gao (Beijing University of Aeronautics and Astronautics) · Yifan Sun (Baidu Research) · Xudong Ding (Beijing University of Aeronautics and Astronautics) · Chuyang Zhao (Baidu) · Si Liu (Beihang University)
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
Andrea Rosasco (Università degli Studi di Genova, Istituto Italiano di Tecnologia) · Stefano Berti (Università degli Studi di Genova, Istituto Italiano di Tecnologia) · Giulia Pasquale (Istituto Italiano di Tecnologia) · Damiano Malafronte (Istituto Italiano di Tecnologia) · Shogo Sato (Sony Interactive Entertainment Inc.) · Hiroyuki Segawa (Sony Interactive Entertainment) · Tetsugo Inada (Sony Interactive Entertainment) · Lorenzo Natale (Istituto Italiano di Tecnologia)
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous and Instruction-guided Driving
Brian Yang (School of Computer Science, Carnegie Mellon University) · Huangyuan Su (Computer Science, School of Engineering and Applied Sciences, Harvard University) · Nikolaos Gkanatsios (Carnegie Mellon University) · Tsung-Wei Ke (CMU, Carnegie Mellon University) · Ayush Jain (Carnegie Mellon University) · Jeff Schneider (Carnegie Mellon University) · Katerina Fragkiadaki (CMU)
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang (University of Hong Kong) · Bo Wu (MIT-IBM Watson AI Lab) · Sunli Chen (Tsinghua University) · Zhenfang Chen (MIT-IBM Watson AI lab) · Haotian Guan (The University of Hong Kong) · Wei-Ning Lee (University of Hong Kong) · Li Erran Li (AWS AI, Amazon) · Chuang Gan (MIT-IBM Watson AI Lab)
ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis
Xiangjun Gao ((HKUST) The Hong Kong University of Science and Technology) · Xiaoyu Li (Tencent AI Lab) · Chaopeng Zhang (Tencent AI Lab) · Qi Zhang (Tencent AI Lab) · Yan-Pei Cao (Tencent ARC Lab) · Ying Shan (Tencent) · Long Quan (The Hong Kong University of Science and Technology)
Flow-Guided Online Stereo Rectification for Wide Baseline Stereo
Anush Kumar (Torc Robotics) · Fahim Mannan () · Omid Hosseini Jafari (Torc Robotics) · Shile Li (Torc Robotics) · Felix Heide (Department of Computer Science, Princeton University)
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li (华中科技大学) · Laurence Yang (Hainan University) · Bocheng Ren (None) · Xin Nie (Huazhong University of Science and Technology) · Zhangyang Gao (Westlake University, China) · Cheng Tan (Zhejiang University & Westlake University) · Stan Z. Li (Westlake University)
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
Tz-Ying Wu (University of California, San Diego) · Chih-Hui Ho (University of California San Diego) · Nuno Vasconcelos (University of California San Diego)
Mosaic-SDF for 3D Generative Models
Lior Yariv (Weizmann Institute of Science) · Omri Puny (Weizmann Institute of Science) · Oran Gafni (Meta AI) · Yaron Lipman (Facebook)
FreeMan: Towards benchmarking 3D human pose estimation under Real-World Conditions
Jiong WANG (Fudan University) · Fengyu Yang (Chinese University of Hong Kong(Shenzhen)) · Bingliang Li (The Chinese University of Hong Kong (Shenzhen)) · Wenbo Gou (Carnegie Mellon University) · Danqi Yan (The Chinese University of Hong Kong Shenzhen) · Ailing Zeng (IDEA) · Yijun Gao (Tencent Turing Lab) · Junle Wang (Tencent) · Yanqing Jing (Tencent) · Ruimao Zhang (The Chinese University of Hong Kong (Shenzhen))
Characteristics Matching Based Hash Codes Generation for Efficient Fine-grained Image Retrieval
Zhen-Duo Chen (Shandong University) · Li-Jun Zhao (Shandong University) · Zi-Chao Zhang (Shandong University) · Xin Luo (Shandong University) · Xin-Shun Xu (Shandong University)
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
Bingyan Liu (South China University of Technology) · Chengyu Wang (Alibaba Group) · Tingfeng Cao (South China University of Technology) · Kui Jia (South China University of Technology) · Jun Huang (University of Science and Technology of China)
DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video
Huiqiang Sun (None) · Xingyi Li (Huazhong University of Science and Technology) · Liao Shen (Huazhong University of Science and Technology) · Xinyi Ye (School of Artificial Intelligence and Automation, Huazhong University of Science and Technology) · Ke Xian (Nanyang Technological University) · Zhiguo Cao ()
TUMTraf V2X Cooperative Perception Dataset
Walter Zimmer (Technical University of Munich (TUM)) · Gerhard Arya Wardana (Department of Informatics, Technische Universität München) · Suren Sritharan (Technische Universität München) · Xingcheng Zhou (Technical University of Munich) · Rui Song (Technical University of Munich) · Alois Knoll (Technical University Munich)
Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models
Matthew Kowal (York University) · Richard P. Wildes (York University) · Kosta Derpanis (York University/Samsung)
A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification
Zexian Yang (None) · Dayan Wu (iie,cas) · Chenming Wu (None) · Zheng Lin (Institute of Information Engineering, Chinese Academy of Sciences) · JingziGU (INSTATUTE OF INFORMATION ENGINEERING,CAS) · Weiping Wang (IIE)
HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation
Linglin Jing (Loughborough University) · Yiming Ding (Fudan University) · Yunpeng Gao (Northwest Polytechnical University Xi'an) · Zhigang Wang (Shanghai AI Lab) · Xu Yan (None) · Dong Wang (Shanghai AI Laboratory) · Gerald Schaefer (Loughborough University) · Hui Fang (Loughborough University) · Bin Zhao (Northwest Polytechnical University Xi'an) · Xuelong Li (Northwestern Polytechnical University)
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
Taeheon Kim (Korea Advanced Institute of Science & Technology) · Sebin Shin (KAIST) · Youngjoon Yu (Korea Advanced Institute of Science and Technology (KAIST)) · Hak Gu Kim (Chung-Ang University) · Yong Man Ro (Korea Advanced Institute of Science and Technology)
Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects
Yijia Weng (Stanford University) · Bowen Wen (NVIDIA) · Jonathan Tremblay (NVIDIA) · Valts Blukis (NVIDIA) · Dieter Fox (University of Washington) · Leonidas Guibas (Stanford University) · Stan Birchfield (NVIDIA)
Human Gaussian Splatting : Real-time Rendering of Animatable Avatars
Arthur Moreau (Huawei Noah's Ark Lab) · Jifei Song (Huawei Technologies Ltd.) · Helisa Dhamo (None) · Richard Shaw (Huawei Technologies Ltd.) · Yiren Zhou (Huawei Technologies Ltd.) · Eduardo Pérez-Pellitero (Huawei Noah's Ark Lab (UK))
Learning to Remove Wrinkled Transparent Film with Polarized Prior
Jiaqi Tang (Hong Kong University of Science and Technology (Guangzhou)) · RUIZHENG WU (Smartmore Technology) · Xiaogang Xu (Zhejiang Lab) · Sixing Hu (Smartmore Corporation) · Ying-Cong Chen (The Hong Kong University of Science and Technology)
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
Yipeng Gao (SUN YAT-SEN UNIVERSITY) · Zeyu Wang (University of California, Santa Cruz) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY) · Cihang Xie (University of California, Santa Cruz) · Yuyin Zhou (UC Santa Cruz)
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim (Michigan State University) · Feng Liu (Michigan State University) · Yiyang Su (None) · Anil Jain (, Michigan State University) · Xiaoming Liu (None)
Training Vision Transformers for Semi-Supervised Semantic Segmentation
Xinting Hu (Nanyang Technological University) · Li Jiang (Max Planck Institute for Informatics) · Bernt Schiele (Max Planck Institute for Informatics)
Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li (Carnegie Mellon University) · Jinglu Wang (Microsoft Research Asia) · Xiaohao Xu (University of Michigan - Ann Arbor) · Xiulian Peng (Microsoft Research Asia) · Rita Singh (School of Computer Science, Carnegie Mellon University) · Yan Lu (Microsoft Research Asia) · Bhiksha Raj (Carnegie Mellon University)
MoST: Multi-modality Scene Tokenization for Motion Prediction
Norman Mu (UC Berkeley) · Jingwei Ji (Waymo LLC) · Zhenpei Yang (Waymo LLC) · Nathan Harada (Waymo LLC) · Haotian Tang (Massachusetts Institute of Technology) · Kan Chen (Waymo) · Charles R. Qi (Waymo) · Runzhou Ge (Waymo) · Kratarth Goel (Waymo) · Zoey Yang (Waymo) · Scott Ettinger (Waymo LLC) · Rami Al-Rfou (Waymo) · Dragomir Anguelov (Waymo) · Yin Zhou (Waymo)
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling
Xinhang Liu (HKUST) · Yu-Wing Tai (None) · Chi-Keung Tang (The Hong Kong University of Science and Technology) · Pedro Miraldo (None) · Suhas Lohit (Mitsubishi Electric Research Labs) · Moitreya Chatterjee (Mitsubishi Electric Research Labs)
NECA: Neural Customizable Human Avatar
Junjin Xiao (School of Computer Science and Engineering, Sun Yat-sen University) · Qing Zhang (SUN YAT-SEN UNIVERSITY) · Zhan Xu (None) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY)
VLP: Vision Language Planning for Autonomous Driving
Chenbin Pan (Syracuse University) · Burhaneddin Yaman (Bosch Center for Artificial Intelligence) · Tommaso Nesti (None) · Abhirup Mallik (Bosch) · Alessandro G Allievi (Bosch / University of Texas at Austin) · Senem Velipasalar (Syracuse University) · Liu Ren (Bosch Research)
Adversarial Text to Continuous Image Generation
Kilichbek Haydarov (King Abdullah University of Science and Technology) · Aashiq Muhamed (CMU, Carnegie Mellon University) · Xiaoqian Shen (King Abdullah University of Science and Technology) · Jovana Lazarevic (University of Novi Sad) · Ivan Skorokhodov (KAUST) · Chamuditha Jayanga Galappaththige (Queensland University of Technology) · Mohamed Elhoseiny (KAUST)
Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM
Linyu Tang (Chongqing University) · Lei Zhang (Chongqing University)
Inversion-Free Image Editing with Language-Guided Diffusion Models
Sihan Xu (University of Michigan - Ann Arbor) · Yidong Huang (University of Michigan - Ann Arbor) · Jiayi Pan (University of California, Berkeley) · Ziqiao Ma (University of Michigan) · Joyce Chai (University of Michigan)
Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning
Menghao Zhang (Beijing University of Posts and Telecommunications) · Jingyu Wang (Beijing University of Post and Telecommunication, Tsinghua University) · Qi Qi (Beijing University of Posts and Telecommunications) · Haifeng Sun (Beijing University of Posts and Telecommunications) · Zirui Zhuang (Beijing University of Posts and Telecommunications) · Pengfei Ren (Beijing University of Posts and Telecommunications) · Ruilong Ma (Beijing University of Posts and Telecommunications) · Jianxin Liao (Beijing University of Posts and Telecommunications)
Uncertainty-aware Action Decoupling Transformer for Action Anticipation
Hongji Guo (Rensselaer Polytechnic Institute) · Nakul Agarwal (Honda Research Institute USA) · Shao-Yuan Lo (Johns Hopkins University) · Kwonjoon Lee (Honda Research Institute USA) · Qiang Ji (Rensselaer Polytechnic Institute)
Boosting Adversarial Training via Fisher-Rao Norm-based Regularization
Xiangyu Yin (University of Liverpool) · Wenjie Ruan (University of Exeter)
InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
Xiefan Guo (Beihang University) · Jinlin Liu (Alibaba Group) · Miaomiao Cui (Alibaba Group) · Jiankai Li (Beihang University) · Hongyu Yang (Beihang University) · Di Huang (Beihang University)
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Chen Sichen (Shanghai Jiao Tong University) · Yingyi Zhang (Tencent Youtu Lab) · Siming Huang (Duke University) · Ran Yi (Shanghai Jiao Tong University) · Ke Fan (Shanghai Jiaotong University) · Ruixin Zhang (Tencent Youtu Lab) · Peixian Chen (Xiamen University) · Jun Wang (None) · Shouhong Ding (Tencent Youtu Lab) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang (The University of Texas at Austin) · Bichen Wu (Facebook) · Jialiang Wang (Facebook) · Licheng Yu (None) · Kunpeng Li (Meta) · Yinan Zhao (Facebook) · Ishan Misra (Facebook) · Jia-Bin Huang (University of Maryland, College Park) · Peizhao Zhang (Facebook) · Peter Vajda (Facebook) · Diana Marculescu (The University of Texas at Austin)
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
Chen Min (Peking University) · Dawei Zhao (Defense Innovation Institute) · Liang Xiao (Defense Innovation Institute) · Jian Zhao () · Xinli Xu (Hong Kong University of Science and Technology) · Zheng Zhu (Tsinghua University) · Lei Jin (Beijing University of Posts and Telecommunications) · Jianshu Li (Ant Group) · Yulan Guo (SUN YAT-SEN UNIVERSITY) · Junliang Xing (Tsinghua University) · Liping Jing (Beijing Jiaotong University) · Yiming Nie (National University of Defense Technology) · Bin Dai (National University of Defense Technology)
DiffLoc: Diffusion Model for Outdoor LiDAR Localization
Wen Li (schoold of informatics xiamen university) · Yuyang Yang (Xiamen University) · Shangshu Yu (Xiamen University) · Guosheng Hu (Oosto) · Chenglu Wen (Xiamen University) · Ming Cheng (Xiamen University) · Cheng Wang (Xiamen University)
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
Kuan-Chih Huang (University of California, Merced) · Weijie Lyu (University of California, Merced) · Ming-Hsuan Yang (University of California at Merced) · Yi-Hsuan Tsai (Google)
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Yihua Huang (None) · Yangtian Sun (None) · Ziyi Yang (None) · Xiaoyang Lyu (University of Hong Kong) · Yan-Pei Cao (Tencent ARC Lab) · Xiaojuan Qi (University of Oxford)
Fairy: Fast Parallellized Instruction-Guided Video-to-Video Synthesis
Bichen Wu (Facebook) · Ching-Yao Chuang (Meta) · Xiaoyan Wang (Massachusetts Institute of Technology) · Yichen Jia (Facebook) · Kapil Krishnakumar (Meta, Inc.) · Tong Xiao (None) · Feng Liang (The University of Texas at Austin) · Licheng Yu (None) · Peter Vajda (Facebook)
SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion
Hsuan-I Ho (ETHZ - ETH Zurich) · Jie Song (ETHZ - ETH Zurich) · Otmar Hilliges (None)
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
ZHIXIANG WEI (University of science and technology of china) · Lin Chen (University of Science and Technology of China) · Xiaoxiao Ma (University of Science and Technology of China) · Huaian Chen (University of Science and Technology of China) · Tianle Liu (University of Science and Technology of China) · Pengyang Ling (University of Science and Technology of China) · Jinjin Zheng (University of Science and Technology of China) · Ben Wang (University of Science and Technology of China) · Yi Jin (University of Science and Technology of China)
OpenStreetView-5M: The Many Roads to Global Visual Geolocation
Guillaume Astruc (ENPC/IGN/CNES) · Nicolas Dufour (Ecole Nationale des Ponts et Chausees) · Ioannis Siglidis (Ecole Nationale des Ponts et Chausees) · Constantin Aronssohn (ENPC, Ecole Nationale des Ponts et Chausées) · Nacim Bouia (Ecole Normale Superieure) · Stephanie Fu (University of California, Berkeley) · Romain Loiseau (IMAGINE - LIGM - ENPC, LASTIG - IGN) · Van Nguyen Nguyen (Ecole des Ponts ParisTech) · Charles Raude (ENPC, Ecole Nationale des Ponts et Chausees) · Elliot Vincent (Imagine (LIGM) - Willow (Inria)) · Lintao XU (Université Gustave Eiffel) · Hongyu Zhou (Ecole Nationale des Ponts et Chausees) · Loic Landrieu (ENPC, IGN)
An Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing
Feiran Hu (Nanjing University of Science and Technology) · Chenlin Zhang (Moonshot AI, Ltd) · Jiangliang GUO (www.ainnovation.com) · Xiu-Shen Wei (Nanjing University of Science and Technology) · Lin Zhao (Nanjing University of Science and Technology) · Anqi Xu (University of Toronto) · Lingyan Gao (AInnovation Lab)
MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
Kaiyu Song (SUN YAT-SEN UNIVERSITY) · Hanjiang Lai (SUN YAT-SEN UNIVERSITY) · Yan Pan (SUN YAT-SEN UNIVERSITY) · Jian Yin ()
Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Ivan Rodin (University of Catania) · Antonino Furnari (University of Catania) · Kyle Min (Intel Labs) · Subarna Tripathi (Intel Corporation) · Giovanni Maria Farinella (University of Catania, Italy)
Multi-Session SLAM using Wide-Baseline Optical Flow
Lahav Lipson (Princeton University) · Jia Deng (Princeton University)
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
Xiao Ma (SEA AI Lab) · Sumit Patidar (Dyson) · Iain Haughton (Dyson Ltd) · Stephen James (Dyson)
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
Peifei Zhu (LY Corporation) · Tsubasa Takahashi (LY Corporation) · Hirokatsu Kataoka (LY Corporation)
UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization
Shuaibo Li (Beijing University of Technology & Institute of Automation, Chinese Academy of Sciences) · Wei Ma (Beijing University of Technology) · Jianwei Guo (Institute of Automation, Chinese Academy of Sciences) · Shibiao Xu (Beijing University of Posts and Telecommunications) · Benchong Li (Beijing University of Technology) · Xiaopeng Zhang (Institute of Automation, Chinese Academy of Sciences)
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Jingyuan Yang (Shenzhen University) · Jiawei Feng (Shenzhen University) · Hui Huang (Shenzhen University)
Decoupled Pseudo-labeling in Semi-Supervised Monocular 3D Object Detection
Jiacheng Zhang (SUN YAT-SEN UNIVERSITY) · Jiaming Li (Baidu) · Xiangru Lin (Baidu) · Wei Zhang (Baidu) · Xiao Tan (Baidu) · Junyu Han (Baidu) · Errui Ding (Baidu Inc.) · Jingdong Wang (Baidu) · Guanbin Li (Sun Yat-sen University)
Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization
Ziying Xia () · Jian Cheng (University of Electronic Science and Technology of China) · Siyu Liu () · Yongxiang Hu (University of Electronic Science and Technology of China) · Shiguang Wang (None) · Zhang Yijie (University of Electronic Science and Technology of China) · Wanli Dang (University of Electronic Science and Technology of China)
Selective, Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Filip Ilic (Graz University of Technology) · He Zhao (York University) · Thomas Pock (Graz University of Technology) · Richard P. Wildes (York University)
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
Kyle Sargent (Computer Science Department, Stanford University) · Zizhang Li (Zhejiang University) · Tanmay Shah (Google) · Charles Herrmann (Google) · Hong-Xing Yu (Computer Science Department, Stanford University) · Yunzhi Zhang (Stanford University) · Eric Ryan Chan (Stanford University) · Dmitry Lagun (Google) · Li Fei-Fei (Stanford University) · Deqing Sun (Google) · Jiajun Wu (Stanford University)
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
Haozhe Xie (Nanyang Technological University) · Zhaoxi Chen (Nanyang Technological University) · Fangzhou Hong (Nanyang Technological University) · Ziwei Liu (Nanyang Technological University)
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
Yang Qin (Sichuan University) · Yingke Chen (Northumbria University) · Dezhong Peng (Sichuan University) · Xi Peng (Sichuan University) · Joey Tianyi Zhou (National University of Singapore ) · Peng Hu (Sichuan University)
LEMON: Learning 3D Human-Object Interaction Relation from 2D Images
Yuhang Yang (University of Science and Technology of China) · Wei Zhai (University of Science and Technology of China) · Hongchen Luo (University of Science and Technology of China) · Yang Cao (University of Science and Technology of China) · Zheng-Jun Zha (University of Science and Technology of China)
Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection
Zhanwei Zhang (None) · Minghao Chen (Zhejiang University) · Shuai Xiao (Alibaba Group) · Liang Peng (FABU Inc) · Hengjia Li (FABU Inc) · Binbin Lin (Zhejiang University) · Ping Li (Hangzhou Dianzi University) · Wenxiao Wang (Zhejiang University) · Boxi Wu (Zhejiang University) · Deng Cai (Zhejiang University)
Brush2Prompt: Contextual Prompt Generator for Object Inpainting
Mang Tik Chiu (University of Illinois, Urbana Champaign) · Yuqian Zhou (University of Illinois, Urbana-Champaign) · Lingzhi Zhang (School of Engineering and Applied Science, University of Pennsylvania) · Zhe Lin (Adobe Research) · Connelly Barnes (Adobe Systems) · Sohrab Amirghodsi (Adobe) · Eli Shechtman (Adobe) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild
weining ren (ETHz) · Zihan Zhu (ETHZ - ETH Zurich) · Boyang Sun (ETH Zurich) · Jiaqi Chen (ETHZ - ETH Zurich) · Marc Pollefeys (ETH Zurich / Microsoft) · Songyou Peng (ETH Zurich & MPI Tübingen)
Step differences in instructional video
Tushar Nagarajan (Meta) · Lorenzo Torresani (Facebook)
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tongjia Chen (Hunan University) · Hongshan Yu (Hunan University) · Zhengeng Yang (Hunan University) · Zechuan Li (Hunan University) · Wei Sun (Hunan University) · Chen Chen ()
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
Chaoqin Huang (Shanghai Jiao Tong University) · Aofan Jiang (Shanghai Jiao Tong University) · Jinghao Feng (Shanghai Jiao Tong University) · Ya Zhang (Shanghai Jiao Tong University) · Xinchao Wang (National University of Singapore) · Yanfeng Wang (Shanghai Jiao Tong University)
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
Chull Hwan Song (Dealicious Inc) · Taebaek Hwang (None) · Jooyoung Yoon (Dealicious Inc) · Shunghyun Choi (Dealicious Inc.) · Yeong Hyeon Gu (Sejong University)
Total Selfie: Generating Full-Body Selfies
Bowei Chen (University of Washington) · Brian Curless (University of Washington) · Ira Kemelmacher-Shlizerman (UW + Google) · Steve Seitz (University of Washington)
Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences
Minyoung Hwang (Seoul National University) · Luca Weihs (Allen Institute for Artificial Intelligence) · Chanwoo Park (Massachusetts Institute of Technology) · Kimin Lee (KAIST) · Aniruddha Kembhavi (Allen Institute for Artificial Intelligence) · Kiana Ehsani (Allen Institute for Artificial Intelligence)
LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding
Min Liang (University of Science and Technology Beijing) · Jia-Wei Ma (University of Science and Technology Beijing) · Xiaobin Zhu (University of Science and Technology Beijing) · Jingyan Qin (University of Science and Technology Beijing) · Xu-Cheng Yin (University of Science and Technology Beijing)
Accelerating Neural Field Training via Soft Mining
Shakiba Kheradmand (University of British Columbia) · Daniel Rebain (University of British Columbia) · Gopal Sharma (None) · Hossam Isack (Google) · Abhishek Kar (Google) · Andrea Tagliasacchi (Simon Fraser University, Google Brain) · Kwang Moo Yi (University Of British Columbia)
PEGASUS: Personalized Generative 3D Avatars with Composable Attributes
Hyunsoo Cha (Seoul National University) · Byungjun Kim (Seoul National University) · Hanbyul Joo (Seoul National University)
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo (DAMO Academy, Alibaba Group) · Yufan Shen (Zhejiang University) · Zhaoqing Zhu (Alibaba Group) · Qi Zheng (Alibaba Group) · Zhi Yu (Zhejiang University) · Cong Yao (Alibaba DAMO Academy)
Continuous Pose for Monocular Cameras in Neural Implicit Representation
Qi Ma (ETH Zurich, INSAIT Sofia) · Danda Paudel (INSAIT, Sofia University) · Ajad Chhatkuli (Swiss Federal Institute of Technology) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.)
Depth Prompting for Sensor-Agnostic Depth Estimation
Jin-Hwi Park (GIST) · Chanhwi Jeong (Gwangju Institute of Science and Technology) · Junoh Lee (Gwangju Institute of Science and Technology) · Hae-Gon Jeon (GIST)
Modality-Collaborative Test-Time Adaptation for Action Recognition
Baochen Xiong (Institute of Automation, Chinese Academy of Sciences; Peng Cheng Lab) · Xiaoshan Yang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Yaguang Song (Peng Cheng Laboratory) · Yaowei Wang (Pengcheng Laboratory) · Changsheng Xu (None)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang (National University of Singapore) · Mingli Zhu (The Chinese University of Hong Kong(Shen Zhen)) · Aishan Liu () · Baoyuan Wu (The Chinese University of Hong Kong, Shenzhen) · Xiaochun Cao (SUN YAT-SEN UNIVERSITY) · Ee-Chien Chang (National University of Singapore)
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu (None) · Xintao Wang (Tencent) · Yixiao Ge (Tencent) · Ying Shan (Tencent) · Mike Zheng Shou (National University of Singapore)
Weakly Supervised Video Individual Counting
Xinyan Liu (None) · Guorong Li (University of Chinese Academy of Sciences) · Yuankai Qi (The University of Adelaide) · Ziheng Yan (University of Chinese Academy of Sciences) · Zhenjun Han (University of the Chinese Academy of Sciences) · Anton van den Hengel (University of Adelaide) · Ming-Hsuan Yang (University of California at Merced) · Qingming Huang (University of Chinese Academy of Sciences)
SHINOBI: SHape and Illumination using Neural Object decomposition via BRDF optimization and Inverse rendering from unconstrained Image collections
Andreas Engelhardt (University of Tübingen) · Amit Raj (Google ) · Mark Boss (Stability AI) · Yunzhi Zhang (Stanford University) · Abhishek Kar (Google) · Yuanzhen Li (Massachusetts Institute of Technology) · Ricardo Martin-Brualla (Google) · Jonathan T. Barron (Google) · Deqing Sun (Google) · Hendrik Lensch (University of Tübingen) · Varun Jampani (Google Research)
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li (Tencent) · Qiang Nie (Tencent Youtu Lab) · Weifu Fu (Tencent Youtu Lab) · Yuhuan Lin (Tencent Youtu Lab) · Guangpin Tao (Tencent YoutuLab) · Yong Liu (Tencent Youtu Lab) · Chengjie Wang (Tencent Youtu Lab; Shanghai Jiao Tong University)
Learning Group Activity Features Through Person Attribute Prediction
Chihiro Nakatani (TTI-J) · Hiroaki Kawashima (University of Hyogo) · Norimichi Ukita (Toyota Technological Institute)
FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha (Rochester Institute of Technology) · Yang Zou (Amazon) · Qiuyu Chen (Amazon) · Zhiheng Li (Amazon AGI) · Yusheng Xie (Amazon) · Siqi Deng (Amazon)
MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections
mude hui (University of California, Santa Cruz) · Zihao Wei (University of Michigan - Ann Arbor) · Hongru Zhu (None) · Fei Xia (Ecole Normale Supérieure de Paris) · Yuyin Zhou (UC Santa Cruz)
Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation
Mohammad Amin Shabani (Simon Fraser University) · Zhaowen Wang (Adobe Research) · Difan Liu (Adobe Research) · Nanxuan Zhao (Adobe Research) · Jimei Yang (Adobe Research) · Yasutaka Furukawa (Simon Fraser University)
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng (Nanyang Technological University) · Hang Zhang (Sichuan University) · Guanzheng Chen (SUN YAT-SEN UNIVERSITY) · Xin Li (Alibaba Group) · Shijian Lu (Nanyang Technological University) · Chunyan Miao (School of Computer Science and Engineering, Nanyang Technological University) · Lidong Bing (Alibaba DAMO Academy)
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Yixun Liang (Hong Kong University of Science and Technology) · Xin Yang (The Hong Kong University of Science and Technology) · Jiantao Lin (Hong Kong University of Science and Technology) · Haodong LI (Hong Kong University of Science and Technology) · Xiaogang Xu (Zhejiang Lab) · Ying-Cong Chen (The Hong Kong University of Science and Technology)
AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation
Taeckyung Lee (KAIST) · Sorn Chottananurak (KAIST) · Taesik Gong (Bell Labs) · Sung-Ju Lee (Korea Advanced Institute of Science & Technology)
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
Li Maomao (The University of HongKong) · Yu Li (International Digital Economy Academy) · Tianyu Yang (IDEA) · Yunfei Liu (International Digital Economy Academy (IDEA)) · Dongxu Yue (Peking University) · Zhihui Lin (Xverse) · Dong Xu (University of Hong Kong)
A Simple Recipe for Language-guided Domain Generalized Segmentation
Mohammad Fahes (INRIA) · TUAN-HUNG VU (valeo.ai) · Andrei Bursuc (valeo.ai) · Patrick Pérez (None) · Raoul de Charette (Inria)
Gradient Alignment for Cross-domain Face Anti-Spoofing
MINH BINH LE (Sungkyunkwan University ( ˘▽˘)っ♨) · Simon Woo (Sungkyunkwan University)
Multi-Object Tracking in the Dark
Xinzhe Wang (Beijing Institute of Technology) · Kang Ma (Beijing Institute of Technology) · Qiankun Liu (Beijing Institute of Technology) · Yunhao Zou (None) · Ying Fu (None)
RoMa: Robust Dense Feature Matching
Johan Edstedt (Computer Vision Laboratory, Linköping University) · Qiyu Sun (East China University of Science and Technology) · Georg Bökman (Chalmers University of Technology) · Mårten Wadenbäck (Linköping University) · Michael Felsberg (Linköping University)
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning
Sikai Bai (The Hong Kong University of Science and Technology) · Jie ZHANG (The Hong Kong Polytechnic University) · Song Guo (Department of Computer Science and Engineering, Hong Kong University of Science and Technology) · Shuaicheng Li (Sensetime Group Limited) · Jingcai Guo (The Hong Kong Polytechnic University) · Jun Hou (Sensetime) · Tao Han (Northwestern Polytechnical University) · Xiaocheng Lu (Northwestern Polytechnical University)
Federated Online Adaptation for Deep Stereo
Matteo Poggi (Università di Bologna) · Fabio Tosi (University of Bologna)
ReconFusion: 3D Reconstruction with Diffusion Priors
Rundi Wu (Columbia University) · Ben Mildenhall (Google) · Philipp Henzler (Google) · Ruiqi Gao (Google) · Keunhong Park (Google) · Daniel Watson (Google DeepMind) · Pratul P. Srinivasan (Google Research) · Dor Verbin (None) · Jonathan T. Barron (Google) · Ben Poole (Google) · Aleksander Holynski (UC Berkeley & Google Research)
Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention
Ju-Hyeon Nam (Inha University) · Nur Suriza Syazwany (Inha University) · Su Jung Kim (Inha University) · Sang-Chul Lee (Inha University)
DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
Yuming Gu (USC Institute for Creative Technologies, University of Southern California) · Hongyi Xu (Bytedance) · You Xie (Bytedance) · Guoxian Song (Bytedance Inc) · Yichun Shi (ByteDance) · Di Chang (University of Southern California | TikTok US) · Jing Yang (USC Institute for Creative Technologies) · Linjie Luo (ByteDance Inc.)
Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes
Gaurav Shrivastava (Department of Computer Science, University of Maryland, College Park) · Abhinav Shrivastava (University of Maryland)
Rotation-Agnostic Image Representation Learning for Digital Pathology
Saghir Alfasly (Mayo Clinic) · Abubakr Shafique (Mayo Clinic) · Peyman Nejat (Mayo Clinic) · Jibran Khan (Luther College) · Areej Alsaafin (Mayo Clinic) · Ghazal Alabtah (Mayo Clinic) · Hamid Tizhoosh (None)
Unsupervised Keypoints from Pretrained Diffusion Models
Eric Hedlin (University of British Columbia) · Gopal Sharma (None) · Shweta Mahajan (University of British Columbia) · Xingzhe He (None) · Hossam Isack (Google) · Abhishek Kar (Google) · Helge Rhodin (UBC) · Andrea Tagliasacchi (Simon Fraser University, Google Brain) · Kwang Moo Yi (University Of British Columbia)
Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
JINLONG LI (Cleveland State University) · Baolu Li (Cleveland State University) · Zhengzhong Tu (University of Texas at Austin) · XINYU LIU (Cleveland State University) · Qing Guo (Institute of High Performance Computing, Singapore, A*STAR) · Felix Juefei Xu () · Runsheng Xu (University of California, Los Angeles) · Hongkai Yu (Cleveland State University)
MaGGIe: Masked Guided Gradual Human Instance Matting
Chuong Huynh (University of Maryland, College Park) · Seoung Wug Oh (Adobe Systems) · Abhinav Shrivastava (University of Maryland) · Joon-Young Lee (Adobe Research)
JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments
Duy Tho Le (Monash University) · Chenhui Gou (Monash University) · Stavya Datta (Monash University) · Hengcan Shi (None) · Ian Reid (University of Adelaide) · Jianfei Cai (Monash University) · Hamid Rezatofighi (Monash University)
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
Haonan Lin (Xi'an Jiaotong University)
Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
Yuang Ai (Institute of Automation, Chinese Academy of Sciences) · Xiaoqiang Zhou (University of Science and Technology of China) · Huaibo Huang (Institute of Automation, Chinese Academy of Sciences) · Lei Zhang (The Hong Kong Polytechnic University) · Ran He (None)
Intensity-Robust Autofocus for Spike Camera
Changqing Su (Peking University) · Zhiyuan Ye (Nanchang Hangkong University) · Yongsheng Xiao (Nanchang Hangkong University) · You Zhou (Nanjing University) · Zhen Cheng (Tsinghua University, Tsinghua University) · Bo Xiong (Peking University) · Zhaofei Yu (Peking University) · Tiejun Huang (Peking University)
LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset
Haolin Liu (The Chinese University of Hong Kong, Shenzhen) · Chongjie Ye (The Chinese University of Hong Kong, Shenzhen) · Yinyu Nie (Huawei Technologies Ltd.) · Yingfan He (Chinese University of Hong Kong, Shenzhen) · Xiaoguang Han (The Chinese University of Hong Kong, Shenzhen)
Generative Multimodal Models are In-Context Learners
Quan Sun (BAAI) · Yufeng Cui (Beihang University) · Xiaosong Zhang (Beijing Academy of Artificial Intelligence) · Fan Zhang (Beijing Academy of Artificial Intelligence) · Qiying Yu (Tsinghua University) · Yueze Wang (Beijing Academy of Artificial Intelligence) · Yongming Rao (Tsinghua University) · Jingjing Liu (Tsinghua University, Tsinghua University) · Tiejun Huang (Peking University) · Xinlong Wang (Beijing Academy of Artificial Intelligence)
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng (University of Science and Technology of China) · Binxin Yang (University of Science and Technology of China) · Tiankai Hang (Southeast University) · Chen Li (Xi'an Jiaotong University) · Shuyang Gu (Research, Microsoft) · Ting Zhang (Beijing Normal University) · Jianmin Bao (Microsoft) · Zheng Zhang (Microsoft) · Houqiang Li (University of Science and Technology of China) · Han Hu (Microsft Research Asia) · Dong Chen (Microsoft) · Baining Guo (Microsoft Research)
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
Linfang Zheng (University of Birmingham) · Tze Ho Elden Tse (University of Birmingham) · Chen Wang (Department of computer science, the University of Hong Kong) · Yinghan Sun (Southern University of Science and Technology) · Hua Chen (Southern University of Science and Technology) · Aleš Leonardis (University of Birmingham) · Wei Zhang (Southern University of Science and Technology of China) · Hyung Jin Chang (University of Birmingham)
When StyleGAN Meets Stable Diffusion: a ${\mathcal{W}_+}$ Adapter for Personalized Image Generation
Xiaoming Li (MMLab@NTU) · Xinyu Hou (Nanyang Technological University) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY)
Learning from Synthetic Human Group Activities
Che-Jui Chang (Rutgers University) · Danrui Li (Rutgers University) · Deep Patel (NEC Laboratories America) · Parth Goel (Oracle) · Seonghyeon Moon (Roblox) · Samuel Sohn (Rutgers University) · Honglu Zhou (Rutgers University) · Sejong Yoon (The College of New Jersey) · Vladimir Pavlovic (Rutgers University) · Mubbasir Kapadia (Rutgers University )
From a Bird’s Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
Zekun Qian (Tianjin University) · Ruize Han (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Wei Feng (Tianjin University) · Song Wang (University of South Carolina)
Prompt-enhanced Multiple Instance Learning for Weakly Supervised Anomaly Detection
Junxi Chen (None) · Liang Li (None) · Li Su (University of Chinese Academy of Sciences) · Zheng-Jun Zha (University of Science and Technology of China) · Qingming Huang (University of Chinese Academy of Sciences)
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
Han Liang (ShanghaiTech University) · Jiacheng Bao (Shanghai Tech University) · Ruichi Zhang (ShanghaiTech University) · Sihan Ren (ShanghaiTech University) · Yuecheng Xu (ShanghaiTech University) · Sibei Yang (None) · Xin Chen (University of Chinese Academy of Sciences, ShanghaiTech University) · Jingyi Yu (ShanghaiTech University) · Lan Xu (ShanghaiTech University)
MeshPose: Unifying DensePose and 3D Body Mesh reconstruction
Eric-Tuan Le (University College London) · Antonios Kakolyris (Snap Inc.) · Petros Koutras (Snap Inc.) · Himmy Tam (Snap Inc.) · Efstratios Skordos (Snap Inc.) · George Papandreou (Snap Inc.) · Riza Alp Guler (Snap Inc.) · Iasonas Kokkinos (Snap Inc.)
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange
Yanhao Wu (Xi'an Jiaotong University) · Tong Zhang (EPFL) · Wei Ke (Xi'an Jiaotong University) · Congpei Qiu (Xi'an Jiaotong University) · Sabine Süsstrunk (None) · Mathieu Salzmann (EPFL)
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang (The Chinese University of Hong Kong) · Xiaohan Ding (Tencent AI Lab) · Kaixiong Gong (None) · Yixiao Ge (Tencent) · Ying Shan (Tencent) · Xiangyu Yue (The Chinese University of Hong Kong)
Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
Baoquan Zhang (, Harbin Institute of Technology (shenzhen)) · Huaibin Wang (Harbin Institute of Technology,Shenzhen) · Luo Chuyao (None) · Xutao Li (Harbin Institute of Technology, Shenzhen) · Guotao liang (Harbin Institute of Technology(shenzhen)) · Yunming Ye (Harbin Institute of Technology, Shenzhen) · joeq (CEO) · Yao He (None)
Rethinking Multi-view Representation Learning via Distilled Disentangling
Guanzhou Ke (Beijing Jiaotong University) · Bo Wang (Peking University) · Xiao-Li Wang (Nanjing University of Science and Technology) · Shengfeng He (Singapore Management University)
Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations
Lei Fan (Northwestern University) · Jianxiong Zhou (Northwestern University) · Xiaoying Xing (Northwestern University) · Ying Wu (Northwestern University)
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Shuai Yang (Nanyang Technological University) · Yifan Zhou (Nanyang Technological University) · Ziwei Liu (Nanyang Technological University) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY)
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
Trevine Oorloff (University of Maryland, College Park) · Surya Koppisetti (Reality Defender Inc) · Nicolo Bonettini (Reality Defender) · Divyaraj Solanki (Reality Defender Inc.) · Ben Colman (Reality Defender) · Yaser Yacoob (University of Maryland, College Park) · Ali Shahriyari (Reality Defender) · Gaurav Bharaj (Reality Defender Inc)
DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
Weiyi Lv (Shanghai University) · Yuhang Huang (National University of Defense Technology) · NING Zhang (PAII Inc.) · Ruei-Sung Lin (PAII Inc) · Mei Han (PAII Inc.) · Dan Zeng (Shanghai University)
Condition-Aware Neural Network for Controlled Image Generation
Han Cai (Massachusetts Institute of Technology) · Muyang Li (None) · Qinsheng Zhang (Georgia Institute of Technology) · Ming-Yu Liu (NVIDIA) · Song Han (Massachusetts Institute of Technology)
Preserving Fairness Generalization in Deepfake Detection
Li Lin () · Xinan He (Nanchang University) · Yan Ju (State University of New York at Buffalo) · Xin Wang (State University of New York at Albany) · Feng Ding (Nanchang University) · Shu Hu (Purdue University)
On The Vulnerability of Efficient Vision Transformers to Adversarial Computation Attacks
Navaneet K L (University of California, Davis) · Soroush Abbasi Koohpayegani (University of California, Davis) · Essam Sleiman (Harvard University, Harvard University) · Hamed Pirsiavash (University of California, Davis)
Transcriptomics-guided Slide Representation Learning in Computational Pathology
Guillaume Jaume (Harvard University) · Lukas Oldenburg (Harvard University) · Anurag Vaidya (Massachusetts Institute of Technology) · Richard J. Chen (Harvard University) · Drew F. K. Williamson (Massachusetts General Hospital, Harvard University) · Thomas Peeters (Harvard University) · Andrew Song (Brigham and Women's hospital) · Faisal Mahmood (Harvard University)
DART: Implicit Doppler Tomography for Radar Novel View Synthesis
Tianshu Huang (Carnegie Mellon University) · John Miller (Carnegie Mellon University) · Akarsh Prabhakara (Carnegie Mellon University) · Tao Jin (CMU, Carnegie Mellon University) · Tarana Laroia (CMU, Carnegie Mellon University) · Zico Kolter (Carnegie Mellon University) · Anthony Rowe (Carnegie Mellon University)
Sparse views, Near light: A practical paradigm for uncalibrated point-light photometric stereo
Mohammed Brahimi (Technische Universität München) · Bjoern Haefner (Technical University Munich) · Zhenzhang Ye (Technische Universität München) · Bastian Goldluecke (University of Konstanz) · Daniel Cremers (Technical University Munich)
EgoGen: An Egocentric Synthetic Data Generator
Gen Li (ETH Zurich) · Kaifeng Zhao (ETHZ - ETH Zurich) · Siwei Zhang (ETH Zurich) · Xiaozhong Lyu (Department of Computer Science, ETHZ - ETH Zurich) · Mihai Dusmanu (Microsoft) · Yan Zhang (ETH Zurich) · Marc Pollefeys (ETH Zurich / Microsoft) · Siyu Tang (ETH Zurich)
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer
Yuan Dong (Alibaba Group) · Chuan Fang (Hong Kong University of Science and Technology) · Liefeng Bo (None) · Zilong Dong (Alibaba Group) · Ping Tan (Hong Kong University of Science and Technology)
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im (NAVER Cloud) · JeongYeon Nam (Naver Cloud) · Nokyung Park (NAVER) · Hyungmin Lee (NAVER) · Seunghyun Park (NAVER Cloud)
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Sangmin Lee (University of Illinois Urbana-Champaign) · Bolin Lai (Georgia Institute of Technology) · Fiona Ryan (Georgia Institute of Technology) · Bikram Boote (University of Illinois, Urbana Champaign) · James Rehg (None)
EvDiG: Event-guided Direct and Global Components Separation
xinyu zhou (Peking University) · Peiqi Duan (None) · Boyu Li (Peking University) · Chu Zhou (Peking University) · Chao Xu (Peking University) · Boxin Shi (Peking University)
A2XP: Towards Private Domain Generalization
Geunhyeok Yu (Kyung Hee University) · Hyoseok Hwang (Kyung Hee University)
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
Qian Wang (Peking University) · Weiqi Li (Peking University) · Chong Mou (Peking University) · Xinhua Cheng (Peking University) · Jian Zhang (Peking University)
FreeKD: Knowledge Distillation via Semantic Frequency Prompt
Yuan Zhang (Peking University) · Tao Huang (The University of Sydney) · Jiaming Liu (Peking University) · Tao Jiang (Zhejiang University) · Kuan Cheng (Peking University) · Shanghang Zhang (Peking University)
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Shanshan Zhong (SUN YAT-SEN UNIVERSITY) · Zhongzhan Huang (Sun Yat-Sen University) · Shanghua Gao (Harvard University) · Wushao Wen (SUN YAT-SEN UNIVERSITY) · Liang Lin (Sun Yat-sen University) · Marinka Zitnik (Harvard University) · Pan Zhou (Sea Group)
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Yiwen Chen (Nanyang Technological University) · Zilong Chen (Tsinghua University) · Chi Zhang (Tencent ) · Feng Wang (Tsinghua University, Tsinghua University) · Xiaofeng Yang (Nanyang Technological University) · Yikai Wang (Tsinghua University) · Zhongang Cai (Nanyang Technological University) · Lei Yang (The Chinese University of Hong Kong) · Huaping Liu (Tsinghua University, Tsinghua University) · Guosheng Lin (Nanyang Technological University)
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis (Layer 6 AI) · Zhaoyan Liu (Layer6 AI) · Satya Krishna Gorti (Layer6 AI) · Valentin Villecroze (Layer 6) · Jesse C. Cresswell (Layer 6 AI) · Guangwei Yu (Layer6 AI) · Gabriel Loaiza-Ganem (Layer 6 AI) · Maksims Volkovs (Layer6 AI)
AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement
Shiwei Jin (None) · Zhen Wang (Qualcomm Technologies, Inc.) · Lei Wang (Qualcomm) · Peng Liu (Qualcomm Inc, QualComm) · Ning Bi (QualComm) · Truong Nguyen (University of California, San Diego)
Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
Yakun Chang (None) · Yeliduosi Xiaokaiti (Peking University) · Yujia Liu (School of Computer Science, Peking University, Beijing, China) · Bin Fan (None) · Zhaojun Huang (Peking University) · Tiejun Huang (Peking University) · Boxin Shi (Peking University)
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang (Ohio State University, Columbus) · Wenbin He (Bosch) · Xiwei Xuan (University of California Davis) · Clint Sebastian (Bosch) · Jorge Piazentin Ono (Bosch) · Xin Li (Bosch Reserach) · Sima Behpour (Bosch Center for Artificial Intelligence (BCAI)) · Thang Doan (Bosch Center for Artificial Intelligence) · Liang Gou (Bosch) · Shen (Ohio State University) · Liu Ren (Bosch Research)
Solving Masked Jigsaw Puzzles with Diffusion Transformers
Jinyang Liu (Northeastern University) · Wondmgezahu Teshome (Northeastern University) · Sandesh Ghimire (QualComm) · Mario Sznaier (Northeastern University) · Octavia Camps (Northeastern University)
RoHM: Robust Human Motion Reconstruction via Diffusion
Siwei Zhang (ETH Zurich) · Bharat Lal Bhatnagar (Meta) · Yuanlu Xu (Meta Reality Labs Research) · Alexander Winkler (Meta) · Petr Kadlecek (Meta Reality Labs Research) · Siyu Tang (ETH Zurich) · Federica Bogo (Meta)
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Yu-Ying Yeh (University of California, San Diego) · Jia-Bin Huang (University of Maryland, College Park) · Changil Kim (Facebook) · Lei Xiao (None) · Thu Nguyen-Phuoc (Reality Labs Research, Meta) · Numair Khan (None) · Cheng Zhang (Facebook) · Manmohan Chandraker (UC San Diego) · Carl Marshall (Reality Labs Research) · Zhao Dong (Meta RL Research) · Zhengqin Li (Facebook)
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang (The Chinese University of Hong Kong) · Shengju Qian (The Chinese University of Hong Kong) · Bohao Peng (The Chinese University of Hong Kong) · Shu Liu (The Chinese University of Hong Kong) · Jiaya Jia (The Chinese University of Hong Kong)
ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation
Jia-Hao Wu (National Yang Ming Chiao Tung University) · Fu-Jen Tsai (National Tsing Hua University) · Yan-Tsung Peng (National Chengchi University) · Charles Tsai (Qualcomm Inc, QualComm) · Chia-Wen Lin (National Tsing Hua University) · Yen-Yu Lin (National Yang Ming Chiao Tung University)
Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households
Zhihao Cao (Tsinghua University, Tsinghua University) · ZiDong Wang (Department of Automation, Tsinghua University, Tsinghua University) · Siwen Xie (Peking University) · Anji Liu (University of California, Los Angeles) · Lifeng Fan (Beijing Institute of General Artificial Intelligence)
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan (University of Maryland, College Park) · Fuxiao Liu (University of Maryland) · Xiyang Wu (University of Maryland, College Park) · Ruiqi Xian (University of Maryland, College Park) · Zongxia Li (University of Maryland, College Park) · Xiaoyu Liu (University of Maryland, College Park) · Xijun Wang (University of Maryland, College Park) · Lichang Chen (Department of Computer Science, University of Maryland, College Park) · Furong Huang (Department of Computer Science, University of Maryland) · Yaser Yacoob (University of Maryland, College Park) · Dinesh Manocha (University of Maryland, College Park) · Tianyi Zhou (University of Maryland, College Park)
CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs
Haocheng Yuan (University of Edinburgh) · Jing Xu (University of Edinburgh, University of Edinburgh) · Hao Pan (Microsoft Research) · Adrien Bousseau (INRIA) · Niloy J. Mitra (University College London) · Changjian Li (University of Edinburgh)
ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention
Jiawei Wang (Shandong University) · Changjian Li (University of Edinburgh)
Active Domain Adaptation with False Negative Prediction for Object Detection
Yuzuru Nakamura (Panasonic Holdings Corporation) · Yasunori Ishii (Panasonic Holdings Corporation) · Takayoshi Yamashita (Chubu University)
HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models
Li Pang (Xi'an Jiaotong University) · Xiangyu Rui (Xi'an Jiaotong University) · Long Cui (Xi'an Jiaotong University) · Hongzhong Wang (Xi'an Jiaotong University) · Deyu Meng () · Xiangyong Cao (Xi'an Jiaotong University)
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao (Shanghai Jiao Tong University) · Mengxi Chen (Shanghai Jiaotong University) · Tianjie Dai (Shanghai Jiao Tong University) · Jiangchao Yao (Shanghai Jiaotong University) · Bo Han (HKBU) · Ya Zhang (Shanghai Jiao Tong University) · Yanfeng Wang (Shanghai Jiao Tong University)
Towards Automatic Power Battery Detection: New Challenge, Benchmark Dataset and Baseline
Xiaoqi Zhao (Dalian University of Technology) · Youwei Pang (Dalian University of Technology) · Zhenyu Chen (Dalian University of Technology) · Qian Yu (Dalian University of Technology) · Lihe Zhang (Dalian University of Technology) · Hanqi Liu (Ohio State University, Columbus) · Jiaming Zuo (University of Southern California) · Huchuan Lu (Dalian University of Technology)
Cinematic Behavior Transfer via NeRF-based Differentiable Filming
Xuekun Jiang (Shanghai Artificial Intelligence Laboratory) · Anyi Rao (Stanford University) · Jingbo Wang (Shanghai AI LAB) · Dahua Lin (The Chinese University of Hong Kong) · Bo Dai (Shanghai AI Laboratory)
Random Entangled Tokens for Adversarially Robust Vision Transformer
Huihui Gong (University of Sydney) · Minjing Dong (City University of Hong Kong) · Siqi Ma (University of New South Wales) · Seyit Camtepe (CSIRO) · Surya Nepal (, CSIRO) · Chang Xu (University of Sydney)
$360+x$: A Panoptic Multi-modal Scene Understanding Dataset
Hao Chen (University of Birmingham) · Yuqi Hou (University of Birmingham) · Chenyuan Qu (University of Birmingham) · Irene Testini (Cardiff University) · Xiaohan Hong (University of Birmingham) · Jianbo Jiao (University of Birmingham)
Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery
Yuqi Zhang (The Chinese University of Hong Kong, Shenzhen) · Guanying Chen (The Chinese University of Hong Kong, Shenzhen) · Jiaxing Chen (Sun Yat-Sen University) · Shuguang Cui (The Chinese University of Hong Kong, Shenzhen)
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen (School of Computer Science, University of Illinois at Urbana-Champaign) · Karan Sikka (SRI International) · Michael Cogswell (SRI International) · Heng Ji (University of Illinois, Urbana-Champaign) · Ajay Divakaran (SRI International)
Rich Human Feedback for Text-to-Image Generation
Youwei Liang (University of California, San Diego) · Junfeng He (Google) · Gang Li (Google) · Peizhao Li (GE HealthCare) · Arseniy Klimovskiy (Google) · Nicholas Carolan (Google) · Jiao Sun (University of Southern California) · Jordi Pont-Tuset (Google Research) · Sarah Young (Google) · Feng Yang (Google Research) · Junjie Ke (None) · Krishnamurthy Dvijotham (Google DeepMind) · Katherine Collins (University of Cambridge) · Yiwen Luo (Research, Google) · Yang Li (Google) · Kai Kohlhoff (Google Research) · Deepak Ramachandran (Google) · Vidhya Navalpakkam (Research, Google)
FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment
Jinglin Xu (University of Science and Technology Beijing) · Sibo Yin (Peking University) · Guohao Zhao (Peking University) · Zishuo Wang (None) · Yuxin Peng (Peking University)
Readout Guidance: Learning Control from Diffusion Features
Grace Luo (University of California, Berkeley) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Oliver Wang (Adobe Research) · Dan B Goldman (None) · Aleksander Holynski (UC Berkeley & Google Research)
MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
Yixin Liu (Lehigh Universisty) · Chenrui Fan (Huazhong University of Science and Technology) · Yutong Dai (Lehigh University) · Xun Chen (Samsung Research America) · Pan Zhou (Huazhong University of Science and Technology) · Lichao Sun (Lehigh University)
A theory of volumetric representations for opaque solids
Bailey Miller (Carnegie Mellon University) · Hanyu Chen (Carnegie Mellon University) · Alice Lai (Carnegie Mellon University) · Ioannis Gkioulekas (Carnegie Mellon University)
DiffusionLight: Light Probes for Free by Painting a Chrome Ball
Pakkapon Phongthawee (Vidyasirimedhi Institute of Science and Technology) · Worameth Chinchuthakun (Tokyo Institute of Technology) · Nontaphat Sinsunthithet (Vidyasirimedhi Institute of Science and Technology) · Varun Jampani (Google Research) · Amit Raj (Google ) · Pramook Khungurn (Cornell University) · Supasorn Suwajanakorn (Vidyasirimedhi Institute of Science and Technology)
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
Anh-Quan Cao (INRIA) · Angela Dai () · Raoul de Charette (Inria)
Describing Differences in Image Sets with Natural Language
Lisa Dunlap (University of California, Berkeley) · Yuhui Zhang (Stanford University) · Xiaohan Wang (Stanford University) · Ruiqi Zhong (University of California Berkeley) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Jacob Steinhardt (University of California Berkeley) · Joseph Gonzalez (University of California - Berkeley) · Serena Yeung (Stanford)
Neural Lineage
Runpeng Yu (National University of Singapore) · Xinchao Wang (National University of Singapore)
Structure-from-Motion from Pixel-wise Correspondences
Philipp Lindenberger (Department of Computer Science, ETHZ - ETH Zurich) · Paul-Edouard Sarlin (ETH Zurich) · Marc Pollefeys (ETH Zurich / Microsoft)
LightIt: Illumination Modeling and Control for Diffusion Models
Peter Kocsis (None) · Kalyan Sunkavalli (Adobe Research) · Julien Philip (Adobe Systems) · Matthias Nießner (Technical University of Munich) · Yannick Hold-Geoffroy (Adobe Research)
RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction
Baptiste Brument (IRIT, University of Toulouse, France) · Robin Bruneau (University of Copenhagen) · Yvain Queau (CNRS) · Jean Mélou (IRIT) · Francois Lauze (Department fo Computer Science, University of Copenhagen) · Jean-Denis Durou (IRIT) · Lilian Calvet (OR-X, Balgrist Hospital, University of Zurich)
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
Hao Li (Northwest Polytechnical University) · Dingwen Zhang (Northwestern Polytechnical University) · Yalun Dai (Nanyang Technological University) · Nian Liu (Mohamed bin Zayed University of Artificial Intelligence) · Lechao Cheng (Hefei University of Technology) · Li Jingfeng (Northwest Polytechnical University Xi'an) · Jingdong Wang (Baidu) · Junwei Han (Northwestern Polytechnical University, Tsinghua University)
3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis
Zhicheng Lu (Northwest Polytechnical University Xi'an) · xiang guo (Northwest Polytechnical University Xi'an) · Le Hui (Nanjing University Of Science And Technology) · Tianrui Chen (Northwest Polytechnical University Xi'an) · Min Yang (None) · Xiao Tang (None) · feng zhu (None) · Yuchao Dai (Northwestern Polytechnical University)
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
Yichen Yao (ShanghaiTech University) · Zimo Jiang (ShanghaiTech University) · YUJING SUN (the University of Hong Kong, University of Hong Kong) · Zhencai Zhu (Innovation Academy for Microsatellites) · Xinge Zhu (The Chinese University of Hong Kong) · Runnan Chen (None) · Yuexin Ma (ShanghaiTech University)
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
Yujun Shi (national university of singaore, National University of Singapore) · Chuhui Xue (ByteDance Inc.) · Jun Hao Liew (ByteDance) · Jiachun Pan (National University of Singapore) · Hanshu Yan (ByteDance) · Wenqing Zhang (Huazhong University of Science and Technology) · Vincent Y. F. Tan (National University of Singapore) · Song Bai (ByteDance)
Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners
Junhao Dong (Nanyang Technological University) · Piotr Koniusz (Data61/CSIRO + Australian National University) · Junxi Chen (SUN YAT-SEN UNIVERSITY) · Xiaohua Xie (SUN YAT-SEN UNIVERSITY) · Yew-Soon Ong (Nanyang Technological University)
Generative Latent Coding for Ultra-Low Bitrate Image Compression
Zhaoyang Jia (University of Science and Technology of China) · Jiahao Li (Microsoft Research Asia) · Bin Li (Microsoft) · Houqiang Li (University of Science and Technology of China) · Yan Lu (Microsoft Research Asia)
SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation
Yanzhe Liu (None) · Rong Chen (Dalian Maritime University) · Yushi Li (Xi'an Jiaotong-Liverpool University) · Yixi Li (Dalian Martime University) · Xuehou Tan (Tokai University)
Differentiable Point-based Inverse Rendering
Hoon-Gyu Chung (POSTECH) · Seokjun Choi (Pohang University of Science and Technology) · Seung-Hwan Baek (POSTECH)
GS-IR: 3D Gaussian Splatting for Inverse Rendering
Zhihao Liang (South China University of Technology) · Qi Zhang (Tencent AI Lab) · Ying Feng (Tencent AI Lab) · Ying Shan (Tencent) · Kui Jia (South China University of Technology)
Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Hang Du (Beijing University of Posts and Telecommunications) · Sicheng Zhang (Beijing University of Posts and Telecommunications) · Binzhu Xie (Beijing University of Posts and Telecommunications) · Guoshun Nan (Beijing University of Posts and Telecommunications) · Jiayang Zhang (Beijing University of Posts and Telecommunications) · Junrui Xu (Beijing University of Posts and Telecommunications) · Hangyu Liu (Beijing University of Posts and Telecommunications) · Sicong Leng (Nanyang Technological University) · Jiangming Liu (Yunnan University) · Hehe Fan (None) · Dajiu Huang (South China University) · Jing Feng (Beijing University of Posts and Telecommunications) · Linli Chen (Sichuan University) · Can Zhang (Beijing University of Posts and Telecommunications) · Xuhuan Li (Beijing University of Posts and Telecommunications) · Hao Zhang (Beijing University of Posts and Telecommunications) · Jianhang Chen (Beijing University of Posts and Telecommunications) · Qimei Cui (Beijing University of Posts and Telecommunications) · Xiaofeng Tao (Beijing University of Posts and Telecommunications)
Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution
Guangyuan Li (Zhejiang University) · Chen Rao (Zhejiang University) · Juncheng Mo (Zhejiang University) · Zhanjie Zhang (Zhejiang University) · Wei Xing (Zhejiang University) · Lei Zhao (Zhejiang University)
Learning Equi-angular Representations for Online Continual Learning
Minhyuk Seo (Yonsei University) · Hyunseo Koh (Gwangju Institute of Science and Technology) · Wonje Jeung (Yonsei University) · Minjae Lee (Yonsei University) · San Kim (Yonsei University) · Hankook Lee (Sungkyunkwan University) · Sungjun Cho (LG AI Research) · Sungik Choi (LG AI Research) · Hyunwoo Kim (Zhejiang Lab) · Jonghyun Choi (Seoul National University)
Improving Bird’s Eye View Semantic Segmentation by Task Decomposition
Tianhao Zhao (Wuhan University) · Yongcan Chen (Wuhan University) · Yu Wu (Wuhan University) · Tianyang Liu (Wuhan University) · Bo Du (Wuhan University) · Peilun Xiao (Didi Research) · shi qiu (None) · Hongda Yang (Beijing DiDi Infinity Technology and Development Co., Ltd.) · Guozhen Li (Didi Global) · yi yang (Didi Global) · Yutian Lin (Wuhan University)
Neural Video Compression with Feature Modulation
Jiahao Li (Microsoft Research Asia) · Bin Li (Microsoft) · Yan Lu (Microsoft Research Asia)
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
Yu Wang (Baidu) · Xin Li (Baidu) · Shengzhao Wen (Baidu) · gang zhang (Baidu Inc.) · Haixiao Yue (Baidu) · Haocheng Feng (Baidu) · Junyu Han (Baidu) · Errui Ding (Baidu Inc.)
Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network
Sizhe Zheng (None) · Pan Gao (Nanjing University of Aeronautics and Astronautics, Tsinghua University) · Peng Zhou (Nanjing University of Aeronautics and Astronautics) · Jie Qin (Nanjing University of Aeronautics and Astronautics)
Efficient Test-Time Adaptation of Vision-Language Models
Adilbek Karmanov (Mohamed bin Zayed University of Artificial Intelligence) · Dayan Guan (Nanyang Technological University) · Shijian Lu (Nanyang Technological University) · Abdulmotaleb El Saddik (Mohamed bin Zayed University of Artificial Intelligence) · Eric P. Xing (Mohamed bin Zayed Univeristy of AI)
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
Xianqi Wang (Huazhong University of Science and Technology) · Gangwei Xu (Huazhong University of Science and Technology) · Hao Jia (Huazhong University of Science and Technology) · Xin Yang (Huazhong University of Science and Technology)
Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz (None) · Mohammad Sabokrou (Okinawa Institute of Science and Technology (OIST)) · Amir Rasouli (Huawei Technologies Canada)
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
Jingyao Xu (Beijing Jiaotong University) · Yuetong Lu (Beijing Jiaotong University) · Yandong Li (Google Research) · Siyang Lu (Beijing Jiaotong University) · Dongdong Wang (University of Central Florida) · Xiang Wei (Beijing Jiaotong university)
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Mir Hossain Hossain (None) · Mennatullah Siam (None) · Leonid Sigal (University Of British Columbia) · Jim Little (University of British Columbia, Canada)
S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data
Xuyang Li (None) · Danfeng Hong (Chinese Academy of Sciences, Aerospace Information Research Institute) · Jocelyn Chanussot (INRIA)
Object Pose Estimation via the Aggregation of Diffusion Features
Tianfu Wang (University of Chinese Academy of Sciences) · Guosheng Hu (Oosto) · Hongguang Wang (Shenyang Institute of Automation)
In Search of a Data Transformation That Accelerates Neural Field Training
Junwon Seo (None) · Sangyoon Lee (POSTECH) · Kwang In Kim (Pohang University of Science and Technology) · Jaeho Lee (POSTECH)
Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
Wonseok Roh (Korea University) · Hwanhee Jung (Korea University) · Giljoo Nam (Meta) · Jinseop Yeom (Korea University) · Hyunje Park (Korea University) · Sang Ho Yoon (KAIST) · Sangpil Kim (Korea University)
In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification
Jinseong Park (Seoul National University) · Yujin Choi (Seoul National University) · Jaewook Lee (Seoul National University)
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen (Huawei) · Dapeng Chen (Huawei Technologies Ltd.) · Ruijin Liu (Xi'an Jiaotong University) · Sai Zhou (Huawei Technologies Ltd.) · Wenyuan Xue (Huawei Technologies Ltd.) · Wei Peng (Huawei Technologies Ltd.)
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang (Yale University) · Chao Feng () · Ziyang Chen (University of Michigan) · Hyoungseob Park (Yale University) · Daniel Wang (Yale University) · Yiming Dou (University of Michigan - Ann Arbor) · Ziyao Zeng (Yale University) · xien chen (Yale University) · Suchisrit Gangopadhyay (Yale University) · Andrew Owens (University of Michigan) · Alex Wong (Yale University)
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Nikita Drobyshev (Meta) · Antoni Bigata Casademunt (Imperial College London) · Konstantinos Vougioukas (Facebook) · Zoe Landgraf (Facebook) · Stavros Petridis (Facebook) · Maja Pantic (Facebook)
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang (School of Artificial Intelligence, University of Chinese Academy of Sciences.) · Xing Nie (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Tong Li (Meituan) · Gaopengfei (Beijing SanKuai Online Technology Co., Ltd.) · Ying Guo (Meituan) · Cheng Zhen (Meituan) · Pengfei Yan (Meituan) · Shiming Xiang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval
Yi Xie (South China University of Technology) · Yihong Lin (South China University of Technology) · Wenjie Cai () · Xuemiao Xu (South China University of Technology) · Huaidong Zhang (South China University of Technology) · Yong Du (Ocean University of China) · Shengfeng He (Singapore Management University)
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
Sijia Chen (Huazhong University of Science and Technology) · En Yu (Huazhong University of Science and Technology) · Jinyang Li (Huazhong University of Science and Technology) · Wenbing Tao (Huazhong University of Science and Technology)
Single-View Refractive Index Tomography with Neural Fields
Brandon Zhao (California Institute of Technology) · Aviad Levis (California Institute of Technology) · Liam Connor (California Institute of Technology) · Pratul P. Srinivasan (Google Research) · Katherine Bouman (California Institute of Technology)
MTLoRA: Low-Rank Adaptation Approach for Efficient Multi-Task Learning
Ahmed Agiza (None) · Marina Neseem (Brown University) · Sherief Reda (Brown University)
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion
Zuoyue Li (ETH Zürich) · Zhenqiang Li (The University of Tokyo) · Zhaopeng Cui (None) · Marc Pollefeys (ETH Zurich / Microsoft) · Martin R. Oswald (University of Amsterdam)
Adaptive Softassign via Hadamard-Equipped Sinkhorn
Binrui Shen (Xi'an Jiaotong-Liverpool University) · Qiang Niu (Xi'an Jiaotong-Liverpool University) · Shengxin Zhu (Beijing Normal Unversity)
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras (NVIDIA) · Miika Aittala (NVIDIA) · Jaakko Lehtinen (Aalto University & NVIDIA) · Janne Hellsten (NVIDIA) · Timo Aila (NVIDIA) · Samuli Laine (NVIDIA)
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han (The Chinese University of Hong Kong) · Kaixiong Gong (None) · Yiyuan Zhang (The Chinese University of Hong Kong) · Jiaqi Wang (Shanghai AI Laboratory) · Kaipeng Zhang (Shanghai AI Laboratory) · Dahua Lin (The Chinese University of Hong Kong) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Peng Gao (The Chinese University of Hong Kong) · Xiangyu Yue (The Chinese University of Hong Kong)
See, Say, and Segment: Correcting False Premises with LMMs
Tsung-Han Wu (University of California, Berkeley) · Giscard Biamby (University of California, Berkeley) · David Chan (University of California Berkeley) · Lisa Dunlap (University of California, Berkeley) · Ritwik Gupta (Defense Innovation Unit) · Xudong Wang (Electrical Engineering & Computer Science Department, University of California Berkeley) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Joseph Gonzalez (University of California - Berkeley)
Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection
Suyeon Kim (Pohang University of Science and Technology) · Dongha Lee (Yonsei University) · SeongKu Kang (University of Illinois Urbana-Champaign) · Sukang Chae (Pohang University of Science and Technology) · Sanghwan Jang (POSTECH) · Hwanjo Yu (POSTECH)
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz (Technion - Israel Institute of Technology, Technion) · Yair Kittenplon (AWS AI Labs) · Aviad Aberdam (Amazon AWS AI) · Elad Ben Avraham (Amazon) · Oren Nuriel (Amazon) · Shai Mazor (Amazon) · Ron Litman (Amazon AI Labs)
Rethinking Generalizable Face Anti-spoofing via Hierarchical Prototype-guided Distribution Refinement in Hyperbolic Space
Chengyang Hu (Shanghai Jiao Tong University) · Ke-Yue Zhang (Tencent) · Taiping Yao (Tencent Youtu Lab) · Shouhong Ding (Tencent Youtu Lab) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
Towards Efficient Replay in Federated Incremental Learning
Yichen Li (Huazhong University of Science and Technology) · Qunwei Li (Ant Group) · Haozhao Wang (Huazhong University of Science and Technology) · Ruixuan Li (Huazhong University of Science and Technology) · Wenliang Zhong (Ant Group) · Guannan Zhang (Tongji University)
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Alexandros Delitzas (ETH Zurich) · Ayça Takmaz (None) · Federico Tombari (Google, TUM) · Robert Sumner (Massachusetts Institute of Technology) · Marc Pollefeys (ETH Zurich / Microsoft) · Francis Engelmann (Department of Computer Science, ETHZ - ETH Zurich)
Desigen: A Pipeline for Controllable Design Template Generation
Haohan Weng (South China University of Technology) · Danqing Huang (Microsoft) · YU QIAO (Central South University) · Hu Zheng (Keio University, Tokyo Institute of Technology) · Chin-Yew Lin (Microsoft) · Tong Zhang (South China University of Technology) · C. L. Philip Chen (South China University of Technology)
ControlRoom3D: Room Generation using Semantic Controls
Jonas Schult (Rheinisch Westfälische Technische Hochschule Aachen) · Sam Tsai (Meta) · Lukas Hoellein (None) · Bichen Wu (Facebook) · Jialiang Wang (Facebook) · Chih-Yao Ma (Facebook) · Kunpeng Li (Meta) · Xiaofang Wang (Meta) · Felix Wimbauer (Technical University of Munich) · Zijian He (None) · Peizhao Zhang (Facebook) · Bastian Leibe (RWTH Aachen University) · Peter Vajda (Facebook) · Ji Hou (Facebook)
LAN: Learning to Adapt Noise for Image Denoising
Changjin Kim (Hanyang University) · Tae Hyun Kim (Hanyang Univ.) · Sungyong Baik (Hanyang University)
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li (Peking University) · Yiming Qin (Peking University) · Minghang Zheng (Peking University) · Xin Jin (Beijing Electronic Science and Technology Institute) · Yang Liu (Peking University)
RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou () · Ziqi Pang (UIUC) · Yu-Xiong Wang (None)
DiaLoc: An Iterative Approach to Embodied Dialog Localization
Chao Zhang (Toshiba Europe Ltd) · Mohan Li (Toshiba Europe Ltd) · Ignas Budvytis (University of Cambridge) · Stephan Liwicki (Toshiba Europe Ltd)
Artist-Friendly Relightable and Animatable Neural Heads
Yingyan Xu (Department of Computer Science, ETHZ - ETH Zurich) · Prashanth Chandran (None) · Sebastian Weiss (DisneyResearch|Studios) · Markus Gross (Disney Research, Disney) · Gaspard Zoss (Disney Research, Disney) · Derek Bradley (DisneyResearch|Studios)
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Thuan Nguyen (VinAI Research) · Anh Tran (VinAI Research)
AVID: Any-Length Video Inpainting with Diffusion Model
Zhixing Zhang (Rutgers University) · Bichen Wu (Facebook) · Xiaoyan Wang (Massachusetts Institute of Technology) · Yaqiao Luo (Facebook) · Luxin Zhang (Meta) · Yinan Zhao (Facebook) · Peter Vajda (Facebook) · Dimitris N. Metaxas (Rutgers) · Licheng Yu (None)
Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning
Hao Xiong (Shanghai Jiao Tong University) · Yehui Tang (Shanghai Jiaotong University) · Xinyu Ye (Shanghai Jiaotong University) · Junchi Yan (Shanghai Jiao Tong University)
Neural Implicit Morphing of Face Images
Guilherme Schardong (Institute of Systems and Robotics, University of Coimbra) · Tiago Novello (IMPA) · Hallison Paz (IMPA) · Iurii Medvedev (Institute of Systems and Robotics, University of Coimbra) · Vinícius Silva (PUC-Rio) · Luiz Velho (IMPA) · Nuno Gonçalves (University of Coimbra)
GDA: Generalized Diffusion for Robust Test-time Adaptation
Yun-Yun Tsai (Columbia University) · Fu-Chen Chen (Amazon Lab126) · Albert Chen (Amazon) · Junfeng Yang (Columbia University) · Che-Chun Su (Amazon) · Min Sun (Amazon/NTHU) · Cheng-Hao Kuo (Amazon)
Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds
Zhimin Yuan (School of Informatics Xiamen University) · Wankang Zeng (Xiamen University) · Yanfei Su (Xiamen University) · Weiquan Liu (Xiamen University) · Ming Cheng (Xiamen University) · Yulan Guo (SUN YAT-SEN UNIVERSITY) · Cheng Wang (Xiamen University)
SubT-MRS Datasets: Pushing SLAM Towards All-weather Environments
Shibo Zhao (Carnegie Mellon University) · Yuanjun Gao (Carnegie Mellon University) · Tianhao Wu (University of Virginia, Charlottesville) · Damanpreet Singh (CMU, Carnegie Mellon University) · Rushan Jiang (Oracle) · Haoxiang Sun (Carnegie Mellon University) · Mansi Sarawata (CMU, Carnegie Mellon University) · Warren Whittaker (Carnegie Mellon University) · Ian Higgins (Carnegie Mellon University) · Shaoshu Su (State University of New York at Buffalo) · Yi Du (State University of New York at Buffalo) · Can Xu (None) · John Keller (Carnegie Mellon University) · Jay Karhade (Carnegie Mellon University) · Lucas Nogueira (Carnegie Mellon University) · Sourojit Saha (CMU, Carnegie Mellon University) · Yuheng Qiu (CMU, Carnegie Mellon University) · Ji Zhang (Carnegie Mellon University) · Wenshan Wang (School of Computer Science, Carnegie Mellon University) · Chen Wang (University at Buffalo) · Sebastian Scherer (None)
SpecNeRF: Gaussian Directional Encoding for Specular Reflections
Li Ma (None) · Vasu Agrawal (Meta Reality Labs Research) · Haithem Turki (Carnegie Mellon University) · Changil Kim (Facebook) · Chen Gao (Meta) · Pedro V. Sander (Hong Kong University of Science and Technology) · Michael Zollhoefer (Meta) · Christian Richardt (Meta Reality Labs)
Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao (Rutgers University, New Brunswick) · Long Zhao (Google DeepMind) · Vijay Kumar BG (NEC Laboratories America) · Yumin Suh (NEC Labs America) · Dimitris N. Metaxas (Rutgers) · Manmohan Chandraker (UC San Diego) · Samuel Schulter (NEC Laboratories America)
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen (SketchX) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Subhadeep Koley (University of Surrey) · Aneeshan Sain (University of Surrey) · Pinaki Nath Chowdhury (University of Surrey) · Yi-Zhe Song (None)
SPIDeRS: Structured Polarization for Invisible Depth and Reflectance Sensing
Tomoki Ichikawa (Kyoto University) · Shohei Nobuhara (Kyoto Institute of Technology) · Ko Nishino (Kyoto University)
UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion
Junsheng Zhou (Tsinghua University) · Weiqi Zhang (Tsinghua University) · Baorui Ma (BAAI) · Kanle Shi (Kuaishou Technology) · Yu-Shen Liu (None) · Zhizhong Han (Wayne State University)
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jialong Zuo (Huazhong University of Science and Technology) · Hanyu Zhou (Huazhong University of Science and Technology) · Ying Nie (Huawei Noah's Ark Lab) · Feng Zhang (Huazhong University of Science and Technology) · Tianyu Guo (Peking University) · Nong Sang (Huazhong University of Science and Technology) · Yunhe Wang (Huawei Noah's Ark Lab) · Changxin Gao (Huazhong University of Science and Technology)
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Ka Chun SHUM (The Hong Kong University of Science and Technology) · Jaeyeon Kim (Hong Kong University of Science and Technology) · Binh-Son Hua (Trinity College Dublin) · Thanh Nguyen (Deakin University) · Sai-Kit Yeung (The Hong Kong University of Science and Technology (HKUST))
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang (Tsinghua University) · Xin Wang (None) · Hong Chen (None) · Zihan Song (Tsinghua University, Tsinghua University) · Wenwu Zhu (Tsinghua University, Tsinghua University)
HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
Yongliang Lin (Zhejiang University) · Yongzhi Su (German Research Center for AI (DFKI)) · Praveen Nathan (German Research Center for AI) · Sandeep Inuganti (German Research Center for AI) · Yan Di (Technische Universität München) · Martin Sundermeyer (None) · Fabian Manhardt (Google) · Didier Stricker (Universität Kaiserslautern) · Jason Rambach (None) · Yu Zhang (Zhejiang University)
AnyScene: Customized Image Synthesis with Composited Foreground
Ruidong Chen (Tianjin University) · Lanjun Wang (Tianjin University) · Weizhi Nie (Tianjin University) · Yongdong Zhang (University of Science and Technology of China) · An-An Liu (Tianjin University)
TE-TAD: Towards Fully End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
Ho-Joong Kim (Korea University) · Jung-Ho Hong (Korea University) · Heejo Kong (Korea University) · Seong-Whan Lee (Korea University)
Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions
Saeed Khorram (Apple) · Mingqi Jiang (Oregon State University) · Mohamad Shahbazi (ETH Zürich) · Mohamad Hosein Danesh (McGill University) · Li Fuxin (Oregon State University)
TRINS: Towards Multimodal Language Models That Can Read
Ruiyi Zhang (Adobe Research) · Yanzhe Zhang (Georgia Institute of Technology) · Jian Chen (Mohamed bin Zayed University of Artificial Intelligence) · Yufan Zhou (State University of New York, Buffalo) · Jiuxiang Gu (Adobe Systems) · Changyou Chen (State University of New York, Buffalo) · Tong Sun (Adobe Systems)
A Unified and Interpretable Emotion Representation and Expression Generation
Reni Paskaleva (First Private Mathematical High School, Sofia, Bulgaria) · Mykyta Holubakha (INSAIT) · Andela Ilic (ETHZ - ETH Zurich) · Saman Motamed (INSAIT, Sofia University) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Danda Paudel (INSAIT, Sofia University)
One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
Minghui Hu (Nanyang Technological University) · Jianbin Zheng (South China University of Technology) · Chuanxia Zheng (University of Oxford) · Chaoyue Wang (JD Explore Academy) · Dacheng Tao (None) · Tat-Jen Cham (Nanyang Technological University)
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
Xinyu Tian (Australian National University) · Shu Zou (Australian National University) · Zhaoyuan Yang (General Electric) · Jing Zhang (Australian National University)
LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging
Haoyang Ge (Tianjin University) · Qiao Feng (None) · Hailong Jia (Tianjin University) · Xiongzheng Li (None) · Xiangjun Yin (None) · You Zhou (Nanjing University) · Jingyu Yang (Tianjin University) · Kun Li (None)
6-DoF Pose Estimation with MultiScale Residual Correlation
Yuelong Li (Amazon) · Yafei Mao (Amazon) · Raja Bala (Amazon) · Sunil Hadap (Amazon)
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
Jongwoo Choi (Visual Media Lab, KAIST) · Kwanggyoon Seo (KAIST) · Amirsaman Ashtari (MD Anderson Cancer Center) · Junyong Noh (Korea Advanced Institute of Science and Technology)
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion
Xiaoyu Wu (Shanghai Jiaotong University) · Yang Hua (Queen's University Belfast) · Chumeng Liang (University of Southern California) · Jiaru Zhang (Shanghai Jiao Tong University) · Hao Wang (Louisiana State University) · Tao Song (Shanghai Jiao Tong University) · Haibing Guan (Shanghai Jiaotong University)
Robust Depth Enhancement via Polarization Prompt Fusion Tuning
Kei IKEMURA (KTH Royal Institute of Technology) · Yiming Huang (HKUST) · Felix Heide (Department of Computer Science, Princeton University) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Qifeng Chen (Hong Kong University of Science and Technology) · Chenyang Lei (The Hong Kong University of Science and Technology)
PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion
Ying-Tian Liu (Tsinghua University, Tsinghua University) · Yuan-Chen Guo (Tsinghua University) · Guan Luo (Tsinghua University, Tsinghua University) · Heyi Sun (Tsinghua University, Tsinghua University) · Wei Yin ( Shenzhen DJI Sciences and Technologies Ltd.) · Song-Hai Zhang (Tsinghua University, Tsinghua University)
Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
Tianming Liang (Sun Yat-sen University) · Chaolei Tan (SUN YAT-SEN UNIVERSITY) · Beihao Xia (Huazhong University of Science and Technology) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY) · Jian-Fang Hu (SUN YAT-SEN UNIVERSITY)
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Xiaoyu Zhou (Peking University) · Zhiwei Lin (Peking University) · Xiaojun Shan (Peking Univerisity) · Yongtao Wang (Peking University) · Deqing Sun (Google) · Ming-Hsuan Yang (University of California at Merced)
Interactive3D: Create What You Want by Interactive 3D Generation
Shaocong Dong (Hong Kong University of Science and Technology) · Lihe Ding (The Chinese University of Hong Kong) · Zhanpeng Huang (SenseTime Research) · Zibin Wang (Sensetime Group Limited) · Tianfan Xue (The Chinese University of Hong Kong) · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology)
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
Leonardo Iurada (Polytechnic Institute of Turin) · Marco Ciccone (Politecnico di Torino) · Tatiana Tommasi (Politecnico di Torino)
CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image
Donggeun Yoon (Chungnam National University / KETI) · Donghyeon Cho (Hanyang University)
Amodal Ground Truth and Completion in the Wild
Guanqi Zhan (VGG, University of Oxford) · Chuanxia Zheng (University of Oxford) · Weidi Xie (Shanghai Jiaotong University) · Andrew Zisserman (University of Oxford)
MiKASA: Multi-Key-Anchor Scene-Aware Transformer for 3D Visual Grounding
Chun-Peng Chang (DFKI) · Shaoxiang Wang (German Research Center for AI) · Alain Pagani (German Research Center for Artificial Intelligence (DFKI)) · Didier Stricker (Universität Kaiserslautern)
Real-Time Exposure Correction via Collaborative Transformations and Adaptive Sampling
Ziwen Li (Huazhong University of Science and Technology) · Feng Zhang (Huazhong University of Science and Technology) · Meng Cao (Mohamed bin Zayed University of Artificial Intelligence) · Jinpu Zhang (Huazhong University of Science and Technology) · Yuanjie Shao (Huazhong University of Science and Technology) · Yuehuan Wang (Huazhong University of Science and Technology) · Nong Sang (Huazhong University of Science and Technology)
Gaussian Splatting SLAM
Hidenobu Matsuki (Imperial College London) · Riku Murai (Imperial College London) · Paul Kelly (Imperial College London) · Andrew J. Davison (Imperial College London)
A Simple Baseline for Efficient Hand Mesh Reconstruction
zhishan zhou (None) · shihao zhou (None) · Zhi Lv (None) · minqiang zou (None) · Yao Tang (None) · Jiajun Liang (None)
EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits Matting
Zitao Wang (None) · Qiguang Miao (Xidian University) · Yue Xi (Xi'an University of Electronic Science and Technology) · Peipei Zhao (Xi'an University of Electronic Science and Technology)
Privacy-preserving Optics for Enhancing Protection in Face De-identification
Jhon Lopez (Universidad Industrial de Santander) · Carlos Hinojosa (KAUST) · Henry Arguello (Universidad Industrial de Santander) · Bernard Ghanem (KAUST)
BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks
Shangqian Gao (University of Pittsburgh) · Yanfu Zhang (College of William and Mary) · Feihu Huang (Nanjing University of Aeronautics and Astronautics) · Heng Huang (University of Pittsburgh)
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Jiayi Guo (Tsinghua University, Tsinghua University) · Xingqian Xu (University of Illinois, Urbana Champaign) · Yifan Pu (Tsinghua University, Tsinghua University) · Zanlin Ni (Tsinghua University) · Chaofei Wang (Tsinghua University, Tsinghua University) · Manushree Vasu (Georgia Institute of Technology) · Shiji Song (Tsinghua University, Tsinghua University) · Gao Huang (Tsinghua University, Tsinghua University) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization
Hongwei Ren (Hong Kong University of Science and Technology) · Jiadong Zhu (The Hong Kong University of Science and Technology (Guangzhou)) · Yue Zhou (Hong Kong University of Science and Technology) · Haotian FU (Hong Kong University of Science and Technology) · Yulong Huang (Central South University) · Bojun Cheng (Hong Kong University of Science and Technology)
ReCoRe: Regularized Contrastive Representation Learning of World Model
Rudra P,K. Poudel (Toshiba Europe Ltd) · Harit Pandya (Toshiba Europe) · Stephan Liwicki (Toshiba Europe Ltd) · Roberto Cipolla (University of Cambridge)
Unmixing before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis
Yang Yu (None) · Erting Pan (Wuhan University) · Xinya Wang (Wuhan University) · Yuheng Wu (Wuhan University) · Xiaoguang Mei (Wuhan University) · Jiayi Ma (Wuhan University)
Utility-Fairness Trade-Offs and How to Find Them
Sepehr Dehdashtian (Michigan State University) · Bashir Sadeghi (Michigan State University) · Vishnu Naresh Boddeti (None)
Learning Continuous 3D Words for Text-to-Image Generation
Ta-Ying Cheng (Department of Computer Science, University of Oxford) · Matheus Gadelha (Adobe Systems) · Thibault Groueix (Adobe Systems) · Matthew Fisher (Adobe Research) · Radomir Mech (University of Calgary) · Andrew Markham (University of Oxford) · Niki Trigoni (University of Oxford)
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue (Ohio State University) · Yuansheng Ni (University of Waterloo) · Kai Zhang (Ohio State University, Columbus) · Tianyu Zheng (Beijing University of Posts and Telecommunications) · Ruoqi Liu (Ohio State University) · Ge Zhang (University of Waterloo) · Samuel Stevens (Ohio State University, Columbus) · Dongfu Jiang (University of Waterloo) · Weiming Ren (University of Waterloo) · Yuxuan Sun (Westlake University) · Cong Wei (University of Waterloo) · Botao Yu (The Ohio State University) · Ruibin Yuan (Hong Kong University of Science and Technology) · Renliang Sun (International Digital Economy Academy) · Ming Yin (Princeton University) · Boyuan Zheng (Ohio State University, Columbus) · Zhenzhu Yang (China University of Geoscience Beijing) · Yibo Liu (University of Victoria) · Wenhao Huang (BAAI) · Huan Sun (Ohio State University, Columbus) · Yu Su (The Ohio State University) · Wenhu Chen (University of Waterloo)
A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling
Qu Wentao (Nanjing University of Science and Technology) · Yuantian Shao (Nanjing University of Science and Technology) · Lingwu Meng (Nanjing University of Science and Technology) · Xiaoshui Huang (Shanghai AI Laboratory) · Liang Xiao (Nanjing University of Science and Technology)
Efficient Solution of Point-Line Absolute Pose
Petr Hruby (Department of Computer Science, ETHZ - ETH Zurich) · Timothy Duff (University of Washington) · Marc Pollefeys (ETH Zurich / Microsoft)
CAMixerSR: Only Details Need More "Attention"
Yan Wang (Nankai University) · Yi Liu (ByteDance Inc.) · Shijie Zhao (ByteDance Inc.) · Junlin Li (ByteDance Inc.) · Li zhang (Bytedance Inc.)
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He () · Yiheng Deng (Zhejiang University) · SHIXIANG TANG (The Chinese University of Hong Kong) · Qihao CHEN (Liaoning Technical University) · Qingsong Xie (OPPO) · Yizhou Wang (None) · Lei Bai (Shanghai AI Laboratory) · Feng Zhu (SenseTime Group LTD) · Rui Zhao (Qing Yuan Research Institute, Shanghai Jiao Tong University) · Wanli Ouyang (University of Sydney) · Donglian Qi (Zhejiang University) · Yunfeng Yan (Zhejiang University)
CoDe: An Explicit Content Decoupling Framework for Image Restoration
Enxuan Gu (Dalian University of Technology) · Hongwei Ge (Dalian University of Technology) · Yong Guo (Max-Planck Institute for Informatics)
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu (Shanghai Jiao Tong University) · Xintao Lv (Shanghai Jiaotong University) · Yichao Yan (Shanghai Jiao Tong University) · Xin Jin (Eastern Institute of Technology, Ningbo) · Wu Shuwen (Shanghai Jiaotong University) · Congsheng Xu (Shanghai Jiaotong University) · Yifan Liu (Shanghai Jiao Tong University) · Yizhou Zhou (WeChat AI) · Fengyun Rao (WeChat, Tencent Inc.) · Xingdong Sheng (Shanghai Jiaotong University) · Yunhui LIU (Lenovo Research) · Wenjun Zeng (None) · Xiaokang Yang (Shanghai Jiao Tong University, China)
One-Shot Open Affordance Learning with Foundation Models
Gen Li (University of Edinburgh) · Deqing Sun (Google) · Laura Sevilla-Lara (University of Edinburgh) · Varun Jampani (Google Research)
Self-Supervised Dual Contouring
Ramana Sundararaman (École Polytechnique) · Roman Klokov (École Polytechnique) · Maks Ovsjanikov (Ecole Polytechnique, France)
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman (Oregon State University) · Stefan Lee (Oregon State University) · Scott Cohen (Adobe Systems) · Kushal Kafle (Adobe Systems)
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Hong Huang (City University of Hong Kong) · Weiming Zhuang (Sony Research) · Chen Chen (Sony AI) · Lingjuan Lyu (Sony AI)
Tactile-Augmented Radiance Fields
Yiming Dou (University of Michigan - Ann Arbor) · Fengyu Yang (Yale University) · Yi Liu (University of Michigan - Ann Arbor) · Antonio Loquercio (University of California, Berkeley) · Andrew Owens (University of Michigan)
Consistent Prompting for Rehearsal-Free Continual Learning
Zhanxin Gao (Sun Yat-sen University) · Jun Cen (None) · Xiaobin Chang (SUN YAT-SEN UNIVERSITY)
MedBN: Robust Test-Time Adaptation against Malicious Test Samples
Hyejin Park (Pohang University of Science and Technology (POSTECH)) · Jeongyeon Hwang (Pohang University of Science and Technology) · Sunung Mun (Pohang University of Science and Technology) · Sangdon Park (POSTECH) · Jungseul Ok (POSTECH)
Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion
Litu Rout (University of Texas at Austin) · Yujia Chen (Google) · Abhishek Kumar (Google DeepMind) · Constantine Caramanis (University of Texas, Austin) · Sanjay Shakkottai (University of Texas, Austin) · Wen-Sheng Chu (Google Research)
Purified and Unified Steganographic Network
GuoBiao Li (Fudan University) · Sheng Li (Fudan University) · Zicong Luo (Fudan University) · Zhenxing Qian (Fudan University) · Xinpeng Zhang (Fudan University)
Deformable One-shot Face Stylization via DINO Semantic Guidance
Yang Zhou (Shenzhen University) · Zichong Chen (Shenzhen University) · Hui Huang (Shenzhen University)
PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
Ardian Umam (National Yang Ming Chiao Tung University) · Cheng-Kun Yang (MediaTek) · Min-Hung Chen (NVIDIA) · Jen-Hui Chuang (None) · Yen-Yu Lin (National Yang Ming Chiao Tung University)
Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures
Huijie Zhang (University of Michigan - Ann Arbor) · Yifu Lu (University of Michigan - Ann Arbor) · Ismail Alkhouri (Michigan State University; University of Michigan) · Saiprasad Ravishankar (Michigan State University) · Dogyoon Song (University of Michigan - Ann Arbor) · Qing Qu (University of Michigan)
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Yang Zhou (SenseTime Research) · Hao Shao (The Chinese University of Hong Kong) · Letian Wang (University of Toronto) · Steven L. Waslander (University of Toronto) · Hongsheng Li (The Chinese University of Hong Kong) · Yu Liu (The Chinese University of Hong Kong)
VecFusion: Vector Font Generation with Diffusion
Vikas Thamizharasan (University of Massachusetts Amherst) · Difan Liu (Adobe Research) · Shantanu Agarwal (Balbix) · Matthew Fisher (Adobe Research) · Michaël Gharbi (Massachusetts Institute of Technology) · Oliver Wang (Adobe Research) · Alec Jacobson (University of Toronto and Adobe Systems) · Evangelos Kalogerakis (UMass Amherst)
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising
Haichao Zhang (Northeastern University) · Yi Xu (Northeastern University) · Hongsheng Lu (Toyota Motor North America) · Takayuki Shimizu (Toyota Motor North America, Inc.) · Yun Fu (Northeastern University)
Learned representation-guided diffusion models for large-image generation
Alexandros Graikos (Stony Brook University) · Srikar Yellapragada (Stony Brook University) · Minh-Quan Le (State University of New York at Stony Brook) · Saarthak Kapse (State University of New York at Stony Brook) · Prateek Prasanna (State University of New York, Stony Brook) · Joel Saltz (State University of New York at Stony Brook) · Dimitris Samaras (Stony Brook University)
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han (Amazon/AWS) · Shuai Zhang (Amazon) · Xingjian Shi (Boson AI) · Markus Reichstein (Max-Planck Institute)
Bilateral Event Mining and Complementary for Event Stream Super-Resolution
Zhilin Huang (Tsinghua University) · Quanmin Liang (Sun Yat-sen University) · Yijie Yu (Tsinghua University) · Chujun Qin (China Southern Power Grid ) · Xiawu Zheng (Xiamen University) · Kai Huang (SUN YAT-SEN UNIVERSITY,) · Zikun Zhou (Peng Cheng Laboratory) · Wenming Yang (Tsinghua University,)
Video Harmonization with Triplet Spatio-Temporal Variation Patterns
Zonghui Guo () · XinYu Han (Ocean University of China) · Jie Zhang (Institute of Computing Technology, Chinese Academy of Sciences) · Shiguang Shan (Institute of Computing Technology, Chinese Academy of Sciences) · Haiyong Zheng (Ocean University of China)
Semantic Line Combination Detector
JINWON KO (Korea University, Seoul) · Dongkwon Jin (Korea University) · Chang-Su Kim (Korea University)
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
Chong Bao (Zhejiang University) · Yinda Zhang (Google) · Yuan Li (Zhejiang University) · Xiyu Zhang (Zhejiang University) · Bangbang Yang (ByteDance Inc) · Hujun Bao (Zhejiang University) · Marc Pollefeys (ETH Zurich / Microsoft) · Guofeng Zhang (Zhejiang University) · Zhaopeng Cui (None)
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
Lingteng Qiu (None) · Guanying Chen (The Chinese University of Hong Kong, Shenzhen) · Xiaodong Gu (Alibaba Group) · Qi Zuo (Alibaba Group) · Mutian Xu (None) · Yushuang Wu (The Chinese University of Hong Kong (Shenzhen)) · Weihao Yuan (Alibaba Group) · Zilong Dong (Alibaba Group) · Liefeng Bo (None) · Xiaoguang Han (The Chinese University of Hong Kong, Shenzhen)
Multi-Modal Hallucination Control by Visual Information Grounding
Alessandro Favero (EPFL - EPF Lausanne) · Luca Zancato (AWS AI Labs) · Matthew Trager (Amazon) · Siddharth Choudhary (Amazon AGI) · Pramuditha Perera (Amazon) · Alessandro Achille (California Institute of Technology) · Ashwin Swaminathan (University of Maryland, College Park) · Stefano Soatto (AWS)
Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis
Atefeh Khoshkhahtinat (None) · Ali Zafari (West Virginia University) · Piyush Mehta (West Virginia University) · Nasser Nasrabadi (West Virginia University)
Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection
Chengjie Wang (Tencent Youtu Lab; Shanghai Jiao Tong University) · wenbing zhu (Fudan University) · Bin-Bin Gao (None) · Zhenye Gan (Tencent Youtu Lab) · Jiangning Zhang (Tencent Youtu Lab) · Zhihao Gu (Shanghai Jiao Tong University) · Bruce Qian (None) · Mingang Chen (Shanghai Development Center of Computer Software Technology) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang (University of Adelaide) · Bohan Zhuang (Monash University) · Qi Wu (University of Adelaide)
3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow
Felix Taubner (LG Electronics) · Prashant Raina (LG Electronics) · Mathieu Tuli (LG Electronics Canada Incorporated, TAIL) · Eu Wern Teh (LG Corporation) · Chul Lee (LG Electronics) · Jinmiao Huang (Meta)
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
Peng Dai (Bytedance Inc.) · Yang Zhang (None) · Tao Liu (ByteDance Inc.) · ZhenFan (Bytedance) · Tianyuan Du (Bytedance) · Zhuo Su (ByteDance) · Xiaozheng Zheng (ByteDance) · Zeming Li (BYTEDANCE)
Traffic Scene Parsing through the TSP6K Dataset
Peng-Tao Jiang (vivo Mobile Communication (Hangzhou) Co., Ltd.) · Yuqi Yang (Nankai University) · Yang Cao (Hong Kong University of Science and Technology) · Qibin Hou (Nankai University) · Ming-Ming Cheng (Nankai University, Tsinghua University) · Chunhua Shen (Zhejiang University)
Garment Recovery with Shape and Deformation Priors
Ren Li (EPFL) · Corentin Dumery (EPFL) · Benoît Guillard (Swiss Federal Institute of Technology Lausanne) · Pascal Fua (Swiss Federal Institute of Technology Lausanne)
KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
Hugues Thomas (Apple Inc.) · Yao-Hung Hubert Tsai (Apple) · Timothy Barfoot (University of Toronto) · Jian Zhang (Apple)
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Muyang Li (None) · Tianle Cai (Princeton University) · Jiaxin Cao (Lepton AI) · Qinsheng Zhang (Georgia Institute of Technology) · Han Cai (Massachusetts Institute of Technology) · Junjie Bai (Lepton AI Inc.) · Yangqing Jia (Lepton AI) · Kai Li (Princeton University) · Song Han (Massachusetts Institute of Technology)
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Sadeep Jayasumana (Google) · Srikumar Ramalingam (Google) · Andreas Veit (Google) · Daniel Glasner (Google) · Ayan Chakrabarti (Google) · Sanjiv Kumar (Google)
Text-Driven Image Editing via Learnable Regions
Yuanze Lin (University of Oxford) · Yi-Wen Chen (University of California, Merced) · Yi-Hsuan Tsai (Google) · Lu Jiang (Carnegie Mellon University) · Ming-Hsuan Yang (University of California at Merced)
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
Feng Lu (Tsinghua University) · Xiangyuan Lan (Peng Cheng Laboratory) · Lijun Zhang (University of Chinese Academy of Sciences) · Dongmei Jiang (Peng Cheng Laboratory) · Yaowei Wang (Pengcheng Laboratory) · Chun Yuan (Tsinghua University, Tsinghua University)
pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
David Charatan (Massachusetts Institute of Technology) · Sizhe Lester Li (Massachusetts Institute of Technology) · Andrea Tagliasacchi (Simon Fraser University, Google Brain) · Vincent Sitzmann (Massachusetts Institute of Technology)
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An (University of Copenhagen) · Guolei Sun (None) · Yun Liu (Institute for Infocomm Research, A*STAR) · Fayao Liu (Institute for Infocomm Research, A*STAR) · Zongwei Wu (Bayerische Julius-Maximilians-Universität Würzburg) · Dan Wang (University of Copenhagen) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Serge Belongie (University of Copenhagen)
Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
Bingfeng Zhang (China University of Petroleum (East China)) · Siyue Yu (Xi'an Jiaotong-Liverpool University) · Yunchao Wei (Beijing Jiaotong University) · Yao Zhao (Beijing Jiaotong University) · Jimin Xiao (Xi'an Jiaotong-Liverpool University)
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu (Peking University) · Fangyun Wei (None) · Yanye Lu (Peking University)
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor
Jaewon Jung (Seoul National University) · Hongsun Jang (Seoul National University) · Jaeyong Song (Seoul National University) · Jinho Lee (Seoul National University)
Adversarial Distillation Based on Slack Matching and Attribution Region Alignment
Shenglin Yin (Peking University) · Zhen Xiao (Peking University) · Mingxuan Song (Peking University) · Jieyi Long (Theta Labs, Inc.)
Universal Robustness via Median Random Smoothing for Real-World Super-Resolution
Zakariya Chaouai (Paris-Saclay University, CEA, List) · Mohamed Tamaazousti (CEA/LIST)
RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection
Zhiwei Lin (Peking University) · Zhe Liu (University of Electronic Science and Technology of China) · Zhongyu Xia (Peking University) · Xinhao Wang (Peking University) · Yongtao Wang (Peking University) · Shengxiang Qi (Chongqing Changan Automobile Co., Ltd) · Yang Dong (Chongqing Changan Automobile Co., Ltd.) · Nan Dong (changan) · Le Zhang (University of Electronic Science and Technology of China) · Ce Zhu (University of Electronic Science and Technology of China)
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang (Nanyang Technological University) · Yinan He (Shanghai AI Laboratory) · Jiashuo Yu (Shanghai AI Laboratory) · Fan Zhang (None) · Chenyang Si (Nanyang Technological University Singapore) · Yuming Jiang (Nanyang Technological University) · Yuanhan Zhang (Nanyang Technological University) · Tianxing Wu (Nanyang Technological University) · Jin Qingyang (Nanyang Technological University) · Nattapol Chanpaisit (Nanyang Technological University) · Yaohui Wang (Shanghai AI Laboratory) · Xinyuan Chen (Shanghai Artificial Intelligence Laboratory) · Limin Wang (Nanjing University) · Dahua Lin (The Chinese University of Hong Kong) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Ziwei Liu (Nanyang Technological University)
SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder
Dihan Zheng (Tsinghua University) · Yihang Zou (Tsinghua University) · Xiaowen Zhang (Hisilicon) · Chenglong Bao (Tsinghua University, Tsinghua University)
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
Zechuan Zhang (Zhejiang University) · Zongxin Yang (Zhejiang University) · Yi Yang (Zhejiang University)
Generalized Predictive Model for Autonomous Driving
Jiazhi Yang (Shanghai AI Laboratory) · Shenyuan Gao (HKUST) · Yihang Qiu (Shanghai Jiao Tong University) · Li Chen (The University of Hong Kong) · Tianyu Li (Fudan University) · Bo Dai (Shanghai AI Laboratory) · Kashyap Chitta () · Penghao Wu (University of California, San Diego) · Jia Zeng (Shanghai Jiaotong University) · Ping Luo (The University of Hong Kong) · Jun Zhang (The Hong Kong University of Science and Technology) · Andreas Geiger (University of Tübingen) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Hongyang Li (Shanghai AI Lab)
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Chenyu You (Yale University) · Yifei Min (Yale University) · Weicheng Dai (Yale University) · Jasjeet Sekhon (Yale University) · Lawrence Staib (Yale University) · James Duncan (Yale University)
Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Song Tang (University of Shanghai for Science and Technology) · Wenxin Su (University of Shanghai for Science and Technology) · Mao Ye (University of Electronic Science and Technology of China) · Xiatian Zhu (University of Surrey)
Grounding and Enhancing Grid-based Models for Neural Fields
Zelin Zhao (Shanghai Jiao Tong University) · FENGLEI FAN (The Chinese University of Hong Kong) · Wenlong Liao (Shanghai Jiaotong University) · Junchi Yan (Shanghai Jiao Tong University)
Neural Sign Actors: A diffusion model for 3D sign language production from text
Vasileios Baltatzis (None) · Rolandos Alexandros Potamias (Imperial College London) · Evangelos Ververas (Huawei Technologies Ltd.) · Guanxiong Sun (Huawei Technologies Ltd.) · Jiankang Deng (Imperial College London & Huawei UKRD) · Stefanos Zafeiriou (Imperial College London)
UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
David Rozenberszki (None) · Or Litany (NVIDIA / Technion) · Angela Dai ()
HumMUSS: Human Motion Understanding using State Space Models
Arnab Mondal (McGill University) · Stefano Alletto (Apple) · Denis Tome (Apple)
APISR: Anime Production Inspired Real-World Anime Super-Resolution
Boyang Wang (University of Michigan - Ann Arbor) · Fengyu Yang (Yale University) · Xihang Yu (University of Michigan - Ann Arbor) · Chao Zhang (Zhejiang University) · Hanbin Zhao (Zhejiang University)
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
SICHENG MO (University of California, Los Angeles) · Fangzhou Mu (University of Wisconsin-Madison) · Kuan Heng Lin (University of California, Los Angeles) · Yanli Liu (Shein Technology LLC) · Bochen Guan (OPPO US Research Center) · Yin Li (University of Wisconsin, Madison) · Bolei Zhou (University of California, Los Angeles)
ShapeWalk: Compositional Shape Editing through Language-Guided Chains
Habib Slim (KAUST) · Mohamed Elhoseiny (KAUST)
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
Pancheng Zhao (Nankai University) · Peng Xu (Tsinghua University, Tsinghua University) · Pengda Qin (Alibaba Group) · Deng-Ping Fan (ETH Zurich) · Zhicheng Zhang (Nankai University) · Guoli Jia (None) · Bowen Zhou (Tsinghua University) · Jufeng Yang (None)
WaveFace: Authentic Face Restoration with Efficient Frequency Recovery
Yunqi Miao (The university of Warwick) · Jiankang Deng (Imperial College London & Huawei UKRD) · Jungong Han (Aberystwyth University)
Hierarchical Histogram Threshold Segmentation – Auto-terminating High-detail Oversegmentation
Thomas Chang (Nuremberg Institute of Technology) · Simon Seibt (Georg-Simon-Ohm-Fachhochschule Nürnberg) · Bartosz von Rymon Lipinski (Technical University oAS Nuremberg)
G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grouping
Junfeng Cheng (Imperial College London) · Tania Stathaki (Imperial College London)
Dual-Enhanced Coreset Selection with Class-wise Collaboration for Online Blurry Class Incremental Learning
Yutian Luo (Renmin University of China) · Shiqi Zhao (China Unicom Research Institute) · Haoran Wu (China Unicom Research Institute ) · Zhiwu Lu (Renmin University of China)
TULIP: Multi-camera 3D Precision Assessment of Parkinson's Disease
Kyungdo Kim (Duke University) · Sihan Lyu (Duke University) · Sneha Mantri (Duke University) · Timothy DUNN (Duke University)
BANF: Band-limited Neural Fields for Levels of Detail Reconstruction
Ahan Shabanov (Simon Fraser University) · Shrisudhan Govindarajan (Simon Fraser University) · Cody Reading (Simon Fraser University) · Leili Goli (University of Toronto) · Daniel Rebain (University of British Columbia) · Kwang Moo Yi (University Of British Columbia) · Andrea Tagliasacchi (Simon Fraser University, Google Brain)
Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle
Hyeokjun Kweon (KAIST) · Jihun Kim (KAIST) · Kuk-Jin Yoon (KAIST)
StraightPCF: Straight Point Cloud Filtering
Dasith de Silva Edirimuni (Deakin University) · Xuequan Lu (La Trobe University) · Gang Li (Deakin University) · Lei Wei (Deakin University) · Antonio Robles-Kelly (Defence Science and Technology Group (DST), Deakin University) · Hongdong Li (Australian National University)
SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving
Yiming Xie (Shenzhen International Graduate School, Tsinghua University) · Henglu Wei (Tsinghua University, Tsinghua University) · Zhenyi Liu (Stanford University) · Xiaoyu Wang (Department of Automation, Tsinghua University) · Xiangyang Ji (Tsinghua University)
An Empirical Study of Scaling Law for Scene Text Recognition
Miao Rang (Huawei Noah's Ark Lab) · Zhenni Bi (Huawei Noah Ark Lab) · Chuanjian Liu (Huawei Technologies Ltd.) · Yunhe Wang (Huawei Noah's Ark Lab) · Kai Han (Huawei Noah's Ark Lab)
PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition
Haosong Zhang (School of Computer Science and Engineering, Nanyang Technological University) · Mei Leong (, A*STAR) · Liyuan Li (I2R, A*STAR) · Weisi Lin (Nanyang Technological University)
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou (Computer Science, Tsinghua University, Tsinghua University) · Chao Yang (Shanghai AI Laboratory) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Chengbin Quan (Tsinghua University, Tsinghua University) · Youjian Zhao (Tsinghua University)
UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
Shuai Yuan (Duke University, Meta, TikTok) · Lei Luo (Meta) · Zhuo Hui (Facebook) · Can Pu (Facebook) · Xiaoyu Xiang (Meta) · Rakesh Ranjan () · Denis Demandolx (Meta)
Low-power, Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann (TU Berlin) · Suman Ghosh (TU Berlin) · Ignacio Juarez Martinez (University of Oxford) · Tom Hart (Oxford Brookes University) · Alex Kacelnik (University of Oxford) · Guillermo Gallego (TU Berlin-ECDF-SCIoI)
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
Taeho Kang (Seoul National University) · Youngki Lee (Seoul National University)
GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
Yuan Dong (Alibaba Group) · Qi Zuo (Alibaba Group) · Xiaodong Gu (Alibaba Group) · Weihao Yuan (Alibaba Group) · zhengyi zhao (Alibaba Group) · Zilong Dong (Alibaba Group) · Liefeng Bo (None) · Qixing Huang (University of Texas at Austin)
Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos
Yuhan Shen (Northeastern University) · Ehsan Elhamifar (None)
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Haoning Wu (Nanyang Technological University) · Zicheng Zhang (Shanghai Jiaotong University) · Erli Zhang (Nanyang Technological University) · Chaofeng Chen (Nanyang Technological University) · Liang Liao (Nanyang Technological University) · Annan Wang (Nanyang Technological University) · Kaixin Xu (I2R, A*STAR) · Chunyi Li (None) · Jingwen Hou (Nanyang Technological University) · Guangtao Zhai (Shanghai Jiao Tong University) · Xue Geng (Institute for Infocomm Research, A*STAR) · Wenxiu Sun (SenseTime Research and Tetras.AI) · Qiong Yan (SenseTime Research) · Weisi Lin (Nanyang Technological University)
Enhancing the Power of OOD Detection via Sample-Aware Model Selection
Feng Xue (Shanghai Jiaotong University) · Zi He (HuNan University) · Yuan Zhang (Beijing Normal University) · Chuanlong Xie (Beijing Normal University) · Zhenguo Li (Huawei) · Falong Tan (Hunan University)
REACTO: Reconstructing Articulated Objects from a Single Video
Chaoyue Song (Nanyang Technological University) · Jiacheng Wei (Nanyang Technological University) · Chuan-Sheng Foo (Centre for Frontier AI Research, A*STAR) · Guosheng Lin (Nanyang Technological University) · Fayao Liu (Institute for Infocomm Research, A*STAR)
Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement
Daiwei Yu (Hangzhou City University) · Zhuorong Li (HangZhou City University) · Lina Wei (Hangzhou City University ) · Canghong Jin (Hangzhou City University) · Yun Zhang (Hangzhou City University) · Sixian Chan (the College of Computer Science and Technology at Zhejiang University of Technology)
Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training
Shizhan Gong (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Qi Dou (The Chinese University of Hong Kong) · Farzan Farnia (The Chinese University of Hong Kong)
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Daichi Horita (The University of Tokyo) · Naoto Inoue (CyberAgent) · Kotaro Kikuchi (None) · Kota Yamaguchi (CyberAgent) · Kiyoharu Aizawa (The University of Tokyo)
A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
Siddharth Srivastava (TensorTour Inc) · Gaurav Sharma (TensorTour Inc.)
Transfer CLIP for Generalizable Image Denoising
Jun Cheng (Huazhong University of Science and Technology) · Dong Liang (Huazhong University of Science and Technology) · Shan Tan (Huazhong University of Science and Technology)
LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
Jing Zhang (New York University) · Irving Fang (New York University) · Hao Wu (New York University) · Akshat Kaushik (New York University) · Alice Rodriguez (New York University) · Hanwen Zhao (New York University) · Juexiao Zhang (New York University) · Zhuo Zheng (Stanford University) · Radu Iovita (New York University) · Chen Feng (New York University)
Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration
Mingyuan Meng (The University of Sydney) · Dagan Feng (University of Sydney) · Lei Bi (the University of Sydney) · Jinman Kim (University of Sydney)
PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren (Beijing Jiaotong University) · Zhicheng Huang (University of Science and Technology Beijing) · Yunchao Wei (Beijing Jiaotong University) · Yao Zhao (Beijing Jiaotong University) · Dongmei Fu (University of Science and Technology Beijing) · Jiashi Feng (ByteDance) · Xiaojie Jin (ByteDance Inc./TikTok)
Cross Initialization for Face Personalization of Text-to-Image Models
Lianyu Pang (None) · Jian Yin () · Haoran Xie (Lingnan University) · Qiping Wang (East China Normal University) · Qing Li (The Hong Kong Polytechnic University, Hong Kong Polytechnic University) · Xudong Mao (None)
Neural Fields as Distributions: Signal Processing Beyond Euclidean Space
Daniel Rebain (University of British Columbia) · Soroosh Yazdani (Google) · Kwang Moo Yi (University Of British Columbia) · Andrea Tagliasacchi (Simon Fraser University, Google Brain)
OmniMotionGPT: Animal Motion Generation with Limited Data
Zhangsihao Yang (None) · Mingyuan Zhou (Innopeak Technology) · Mengyi Shan (University of Washington) · Bingbing Wen (University of Washington) · Ziwei Xuan (Innopeak Technology) · Mitch Hill (None) · Junjie Bai (CuraCloud Corporation) · Guo-Jun Qi (University of Central Florida) · Yalin Wang (Arizona State University)
Stratified Avatar Generation from Sparse Observations
Han Feng (Wuhan University) · Wenchao Ma (Pennsylvania State University) · Quankai Gao (University of Southern California) · Xianwei Zheng (Wuhan University) · Nan Xue (Ant Group) · Huijuan Xu (Pennsylvania State University--University Park)
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
Gangwei Xu (Huazhong University of Science and Technology) · Yujin Wang (Shanghai Artificial Intelligence Laboratory) · Jinwei Gu (The Chinese University of Hong Kong) · Tianfan Xue (The Chinese University of Hong Kong) · Xin Yang (Huazhong University of Science and Technology)
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang (Institute of Computing Technology, Chinese Academy of Sciences) · Xiangyang Li (Institue of Computing Technology, Chinese Academy of Sciences) · Jiahao Yang (Institute of Computing Technology, Chinese Academy of Sciences) · Yeqi Liu (Institute of Computing Technology, Chinese Academy of Sciences) · Junjie Hu (University of Wisconsin, Madison) · Ming Jiang (Indiana University) · Shuqiang Jiang (Institute of Computing Technology, Chinese Academy of Sciences)
Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations
Kewei Wang (Huazhong University of Science and Technology) · Yizheng Wu (Nanyang Technological University) · Jun Cen (None) · Zhiyu Pan (None) · Xingyi Li (Huazhong University of Science and Technology) · Zhe Wang (Sensetime Group Limited) · Zhiguo Cao () · Guosheng Lin (Nanyang Technological University)
Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil (ML Labs, Dublin City University) · Raiymbek Akshulakov (University of California, Berkeley) · YASSER ABDELAZIZ DAHOU DJILALI (Technology Innovation Institute) · Mohamed El Amine Seddik (Technology Innovation Institute) · Sanath Narayan (Technology Innovation Institute) · Karttikeya Mangalam (University of California Berkeley) · Noel O'Connor (Dublin City University)
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
Zhongwei Zhang (University of Science and Technology of China) · Fuchen Long (JD.com) · Yingwei Pan (HiDream.ai) · Zhaofan Qiu (University of Science and Technology of China) · Ting Yao (JD AI Research) · Yang Cao (University of Science and Technology of China) · Tao Mei (JD Explore Academy)
Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform
Chunghyun Park (POSTECH) · Seungwook Kim (POSTECH) · Jaesik Park (Seoul National University) · Minsu Cho (POSTECH)
FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning
Rishub Tamirisa (Lapis Labs) · Chulin Xie (University of Illinois, Urbana Champaign) · Wenxuan Bao (University of Illinois Urbana Champaign) · Andy Zhou (Lapis Labs) · Ron Arel (Lapis Lapis, UIUC) · Aviv Shamsian (Bar-Ilan University)
Test-Time Adaptation for Depth Completion
Hyoungseob Park (Yale University) · Anjali W Gupta (Yale) · Alex Wong (Yale University)
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li (SIAT, UCAS) · Yali Wang (SIAT, Chinese Academy of Sciences) · Yinan He (Shanghai AI Laboratory) · Yizhuo Li (The University of Hong Kong) · Yi Wang (Shanghai AI Laboratory) · Yi Liu (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Zun Wang (Australian National University) · Jilan Xu (None) · Guo Chen (Nanjing University) · Ping Luo (The University of Hong Kong) · Limin Wang (Nanjing University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory)
Dispel Darkness for Better Fusion: A Controllable Visual Enhancer based on Cross-modal Conditional Adversarial Learning
Hao Zhang (Wuhan University) · Linfeng Tang (Wuhan University) · Xinyu Xiang (Wuhan University) · Xuhui Zuo (Wuhan University) · Jiayi Ma (Wuhan University)
Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
Shuji Habuchi (Nagoya University) · Keita Takahashi (Nagoya University) · Chihiro Tsutake (Nagoya University) · Toshiaki Fujii (Nagoya University) · Hajime Nagahara (Osaka University)
Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu (The Chinese University of Hong Kong) · Yuechen Zhang (The Chinese University of Hong Kong) · Wenbo Li (Huawei Technologies Ltd.) · Zhe Lin (Adobe Research) · Jiaya Jia (The Chinese University of Hong Kong)
GenN2N: Generative NeRF2NeRF Translation
Xiangyue Liu () · Han Xue (Tsinghua University, Tsinghua University) · Kunming Luo (Hong Kong University of Science and Technology) · Ping Tan (Hong Kong University of Science and Technology) · Li Yi ()
Dual-scale Transformer for Large-scale Single-Pixel Imaging
Gang Qu (Westlake University) · Ping Wang (Zhejiang University) · Xin Yuan (Westlake University)
A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution
Zhixiong Yang (National University of Defense Technology) · Jingyuan Xia (National University of Defense Technology) · Shengxi Li (Beihang University) · Xinghua Huang (National University of Defense Technology) · Shuanghui Zhang (National University of Defense Technology) · Zhen Liu (National University of Defense Technology) · Yaowen Fu (National University of Defense Technology) · Yongxiang Liu (National University of Defense Technology)
Parameter Efficient Self-Supervised Geospatial Domain Adaptation
Linus Scheibenreif (University of St.Gallen) · Michael Mommert (Stuttgart University of Applied Sciences) · Damian Borth (University of St.Gallen)
Multimodal Representation Learning by Alternating Unimodal Adaptation
Xiaohui Zhang (University of Notre Dame) · Jaehong Yoon (University of North Carolina at Chapel Hill) · Mohit Bansal (University of North Carolina at Chapel Hill) · Huaxiu Yao (Department of Computer Science, University of North Carolina at Chapel Hill)
Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps
Octave Mariotti (University of Edinburgh) · Oisin Mac Aodha (University of Edinburgh) · Hakan Bilen (University of Edinburgh)
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
Zhengang Li (Northeastern University) · Yan Kang (None) · Yuchen Liu (None) · Difan Liu (Adobe Research) · Tobias Hinz (Adobe Systems) · Feng Liu (Adobe Systems) · Yanzhi Wang (Northeastern University)
Compositional Video Understanding with Spatiotemporal Structure-based Transformers
Hoyeoung Yun (Hanyang University) · Jinwoo Ahn (Hanyang University) · Minseo Kim (Hanyang University) · Eun-Sol Kim (Hanyang University)
CoDi-2: Interleaved and In-Context Any-to-Any Generation
Zineng Tang (University of North Carolina, Chapel Hill) · Ziyi Yang (Microsoft) · MAHMOUD KHADEMI (Microsoft) · Yang Liu (Microsoft) · Chenguang Zhu (Zoom) · Mohit Bansal (University of North Carolina at Chapel Hill)
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation
Yamei Chen (Technische Universität München) · Yan Di (Technische Universität München) · Guangyao Zhai (Technical University of Munich) · Fabian Manhardt (Google) · Chenyangguang Zhang (Tsinghua University) · Ruida Zhang (Department of Automation, Tsinghua University, Tsinghua University) · Federico Tombari (Google, TUM) · Nassir Navab (TU Munich) · Benjamin Busam (Technical University of Munich)
ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
Dar-Yen Chen (SketchX) · Hamish Tennent (PicCollage) · Ching-Wen Hsu (PicCollage)
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee (Arizona State University) · Tejas Gokhale (University of Maryland, Baltimore County) · Chitta Baral (Arizona State University) · 'YZ' Yezhou Yang (Arizona State University)
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
Inhwan Bae (GIST) · Young-Jae Park (GIST) · Hae-Gon Jeon (GIST)
TexVocab: Texture Vocabulary-conditioned Human Avatars
Yuxiao Liu (None) · Zhe Li (Tsinghua University) · Yebin Liu (Tsinghua University) · Haoqian Wang (Tsinghua University, Tsinghua University)
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features
Thomas Wimmer (École Polytechnique & Technical University of Munich) · Peter Wonka (KAUST) · Maks Ovsjanikov (Ecole Polytechnique, France)
SketchINR: A First Look into Sketches as Implicit Neural Representations
Hmrishav Bandyopadhyay (University of Surrey) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Pinaki Nath Chowdhury (University of Surrey) · Aneeshan Sain (University of Surrey) · Tao Xiang (University of Surrey) · Timothy Hospedales (None) · Yi-Zhe Song (None)
Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement
Jinyoung Jun (Korea University) · Jae-Han Lee (Gauss Labs) · Chang-Su Kim (Korea University)
Neighbor Relations Matter in Video Scene Detection
Jiawei Tan (Chongqing University) · Hongxing Wang (Chongqing University) · Jiaxin Li (Chongqing University) · Zhilong Ou (Chongqing University) · Zhangbin Qian (Chongqing University)
NOPE: Novel Object Pose Estimation from a Single Image
Van Nguyen Nguyen (Ecole des Ponts ParisTech) · Thibault Groueix (Adobe Systems) · Georgy Ponimatkin (CIIRC, Czech Technical University, Czech Technical University of Prague) · Yinlin Hu (Magic Leap) · Renaud Marlet (INRIA) · Mathieu Salzmann (EPFL) · Vincent Lepetit (Ecole des Ponts ParisTech)
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
Yuqi Wang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Jiawei He (Institute of automation, Chinese Academy of Sciences) · Lue Fan (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Hongxin Li (Institute of Automation, Chinese Academy of Sciences) · Yuntao Chen (CAIR, HKISI, CAS) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics
Xingtao Wang (Harbin Institute of Technology) · Hongliang Wei (Harbin Institute of Technology) · Xiaopeng Fan (Harbin Institute of Technology) · Debin Zhao (Harbin Institute of Technology)
Link-Context Learning for Multimodal LLMs
Yan Tai (Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China) · Weichen Fan (HyperGAI) · Zhao Zhang (Sensetime Research) · Ziwei Liu (Nanyang Technological University)
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai (None) · Kuofeng Gao (Tsinghua University, Tsinghua University) · Shaobo Min (University of Science and Technology of China) · Shu-Tao Xia (Shenzhen International Graduate School, Tsinghua University) · Zhifeng Li (Tencent) · Wei Liu (Tencent AI Lab)
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva (University of Oxford) · Andrew Zisserman (University of Oxford)
Unsupervised Template-assisted Point Cloud Shape Correspondence Network
Jiacheng Deng (University of Science and Technology of China) · Jiahao Lu (University of Science and Technology of China) · Tianzhu Zhang (University of Science and Technology of China)
Efficient Model Stealing Defense with Noise Transition Matrix
Dong-Dong Wu (Southeast University) · Chilin Fu (Ant Group) · Weichang Wu (Alibaba Group) · Wenwen Xia (Shanghai Jiaotong University) · Xiaolu Zhang (None) · JUN ZHOU (Ant Group) · Min-Ling Zhang (Southeast University)
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Hangjie Yuan (Zhejiang University) · Shiwei Zhang (Alibaba Group) · Xiang Wang (Huazhong University of Science and Technology) · Yujie Wei (Fudan University) · Tao Feng (Tsinghua University) · Yining Pan (Singapore University of Technology and Design) · Yingya Zhang (Alibaba Group) · Ziwei Liu (Nanyang Technological University) · Samuel Albanie (University of Cambridge) · Dong Ni (Zhejiang University)
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal (University of California, Los Angeles) · Yonatan Bitton (Google) · Idan Szpektor (Google) · Kai-Wei Chang (University of California, Los Angeles) · Aditya Grover (University of California, Los Angeles)
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu (Hong Kong University of Science and Technology (Guangzhou)) · Xu Zheng (HKUST) · Jiazhou Zhou (Hong Kong University of Science and Technology) · Lin Wang (Hong Kong University of Science and Technology)
HEAL-SWIN: A Vision Transformer On The Sphere
Oscar Carlsson (Division of Algebra and Geometry, Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg) · Jan E. Gerken (Chalmers University of Technology) · Hampus Linander (Chalmers University of Technology) · Heiner Spiess (Technische Universität Berlin) · Fredrik Ohlsson (Umea University) · Christoffer Petersson (Zenseact) · Daniel Persson (Chalmers University of Technology)
FMA-Net: Flow Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
Geunhyuk Youk (Korea Advanced Institute of Science and Technology) · Jihyong Oh (Chung-Ang University) · Munchurl Kim (Korea Advanced Institute of Science and Technology)
UniDepth: Universal Monocular Metric Depth Estimation
Luigi Piccinelli (ETH Zurich) · Yung-Hsu Yang (None) · Christos Sakaridis (ETH Zurich) · Mattia Segu (ETH Zurich - Swiss Federal Institute of Technology) · Siyuan Li (ETH Zurich) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Fisher Yu (ETH Zurich)
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain (Georgia Tech) · Jianwei Yang (Microsoft Research) · Humphrey Shi (Georgia Tech | UIUC / Oregon | PAIR)
QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction
Ishak Ayad (ETIS & AGM, CY Cergy Paris University, ENSEA, CNRS) · Nicolas Larue (ETIS , CY Cergy Paris University, ENSEA, CNRS, University of Ljubljana) · Mai K. Nguyen (ETIS , CY Cergy Paris University, ENSEA, CNRS)
Cross-dimension Affinity Distillation for 3D EM Neuron Segmentation
Xiaoyu Liu (University of Science and Technology of China) · Miaomiao Cai (University of Science and Technology of China) · Yinda Chen (University of Science and Technology of China) · Yueyi Zhang (University of Science and Technology of China) · Te Shi (Institute of Artificial Intelligence, Hefei Comprehensive National Science Center) · Ruobing Zhang (Suzhou Institute of Biomedical Engineering and Technology) · Xuejin Chen (University of Science and Technology of China) · Zhiwei Xiong (None)
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
Xunjiang Gu (University of Toronto) · Guanyu Song (University of Toronto) · Igor Gilitschenski (University of Toronto) · Marco Pavone (NVIDIA) · Boris Ivanovic (NVIDIA)
Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation
Keonhee Han (Technical University of Munich) · Dominik Muhle (Technical University of Munich) · Felix Wimbauer (Technical University of Munich) · Daniel Cremers (Technical University Munich)
CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection
JiaBao Wang (Nankai University) · yuming chen (None) · Zhaohui Zheng (Nankai University) · Xiang Li (Nankai University) · Ming-Ming Cheng (Nankai University, Tsinghua University) · Qibin Hou (Nankai University)
Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion
Lalit Manam (Indian Institute of Science) · Venu Madhav Govindu (Indian Institute of Science)
Seeing the World through Your Eyes
Hadi Alzayer (University of Maryland) · Kevin Zhang (UMD CP / Adobe) · Brandon Y. Feng (Massachusetts Institute of Technology) · Christopher Metzler (University of Maryland, College Park) · Jia-Bin Huang (University of Maryland, College Park)
Equivariant Multi-Modality Image Fusion
Zixiang Zhao (Xi'an Jiaotong University) · Haowen Bai (Xi'an Jiaotong University) · Jiangshe Zhang (Xi'an Jiaotong University) · Yulun Zhang (Shanghai Jiao Tong University) · Kai Zhang (None) · Shuang Xu (Northwest Polytechnical University Xi'an) · Dongdong Chen (Heriot-Watt University) · Radu Timofte (University of Würzburg) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.)
Residual Denoising Diffusion Models
Jiawei Liu (Shenyang Institute of Automation, Chinese Academy of Sciences) · Qiang Wang (Shenyang University) · Huijie Fan (None) · Yinong Wang (University of Hong Kong) · Yandong Tang (Shenyang Institue of Automation) · Liangqiong Qu (The University of Hong Kong)
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
Zheng Li (Nankai University) · Xiang Li (Nankai University) · xinyi fu (Ant group) · Xin Zhang (Nankai University) · Weiqiang Wang (University of Southern California) · Shuo Chen (RIKEN) · Jian Yang (Nankai University)
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Ruoyu Feng (University of Science and Technology of China) · Wenming Weng (None) · Yanhui Wang (None) · Yuhui Yuan (Microsoft Research Asia) · Jianmin Bao (Microsoft) · Chong Luo (Microsoft Research Asia) · Zhibo Chen (University of Science and Technology of China) · Baining Guo (Microsoft Research)
CORES: Convolutional Response-based Score for Out-of-distribution Detection
Keke Tang (Guangzhou University) · Chao Hou (Guangzhou University) · Weilong Peng (None) · Runnan Chen (None) · Peican Zhu (Northwest Polytechnical University Xi'an) · Wenping Wang (Texas A&M University - College Station) · Zhihong Tian (Guangzhou University)
MoDE: CLIP Data Experts via Clustering
Jiawei Ma (Columbia University) · Po-Yao Huang (Facebook) · Saining Xie (Facebook) · Shang-Wen Li (Facebook) · Luke Zettlemoyer (University of Washington) · Shih-Fu Chang (Columbia University) · Wen-tau Yih (Meta Platforms, Inc.) · Hu Xu (FAIR, Multimodal Foundation)
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation
Aysim Toker (Technical University Munich) · Marvin Eisenberger (Technical University Munich) · Daniel Cremers (Technical University Munich) · Laura Leal-Taixe (NVIDIA)
Dual-consistency Model Inversion for Non-exemplar Class Incremental Learning
Zihuan Qiu (University of Electronic Science and Technology of China) · Yi Xu (Dalian University of Technology) · Fanman Meng (University of Electronic Science and Technology of China) · Hongliang Li (University of Electronic Science and Technology of China, Tsinghua University) · Linfeng Xu (University of Electronic Science and Technology of China) · Qingbo Wu (University of Electronic Science and Technology of China)
Class Tokens Infusion for Weakly Supervised Semantic Segmentation
Sung-Hoon Yoon (KAIST) · Hoyong Kwon (KAIST) · Hyeonseong Kim (KAIST) · Kuk-Jin Yoon (KAIST)
PointOBB: Learning Oriented Object Detection via Single Point Supervision
Junwei Luo (Wuhan University) · Xue Yang (Shanghai AI Laboratory) · Yi Yu (Southeast University) · Qingyun Li (Harbin Institute of Technology) · Junchi Yan (Shanghai Jiao Tong University) · Yansheng Li (Wuhan University)
Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision
Mohammad Reza Hosseinzadeh Taher (Arizona State University) · Michael Gotway (Mayo Clinic) · Jianming Liang (Arizona State University)
Category-Level Multi-Part Multi-Joint 3D Shape Assembly
Yichen Li (Massachusetts Institute of Technology) · Kaichun Mo (NVIDIA Research) · Yueqi Duan (None) · He Wang (None) · Jiequan Zhang (None) · Lin Shao (National University of Singapore) · Wojciech Matusik (Massachusetts Institute of Technology) · Leonidas Guibas (Stanford University)
JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
YUNCHENG GUO (None) · Xiaodong Gu (Fudan University)
A Category Agnostic Model for Visual Rearrangement
Yuyi Liu (Institute of Computing Technology,University of the Chinese Academy of Sciences) · Xinhang Song (None) · Weijie Li (Alibaba Group) · XIAOHAN Wang (Xi'an Jiaotong University) · Shuqiang Jiang (Institute of Computing Technology, Chinese Academy of Sciences)
WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng (Yale University) · Hyoungseob Park (Yale University) · Fengyu Yang (Yale University) · Daniel Wang (Yale University) · Stefano Soatto (University of California, Los Angeles) · Dong Lao (University of California, Los Angeles) · Alex Wong (Yale University)
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang (Shanghai AI Laboratory) · Xiaohan Mao (Shanghai Jiaotong University) · Chenming Zhu (The Chinese University Of Hong Kong, Shenzhen) · Runsen Xu (The Chinese University of Hong Kong) · Ruiyuan Lyu (Shanghai AI Laboratory) · Peisen Li (Tsinghua University, Tsinghua University) · Xiao Chen (The Chinese University of Hong Kong) · Wenwei Zhang (None) · Kai Chen (Shanghai AI Laboratory) · Tianfan Xue (The Chinese University of Hong Kong) · Xihui Liu (The University of Hong Kong) · Cewu Lu (Shanghai Jiao Tong University) · Dahua Lin (The Chinese University of Hong Kong) · Jiangmiao Pang (Shanghai AI Laboratory )
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
Guillaume Jaume (Harvard University) · Anurag Vaidya (Massachusetts Institute of Technology) · Richard J. Chen (Harvard University) · Drew F. K. Williamson (Massachusetts General Hospital, Harvard University) · Paul Pu Liang (Carnegie Mellon University) · Faisal Mahmood (Harvard University)
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
LIn Zhao (Infinigence) · Tianchen Zhao (Tsinghua University, Tsinghua University) · Zinan Lin (Microsoft Research) · Xuefei Ning (Tsinghua University, Tsinghua University) · Guohao Dai (Shanghai Jiaotong University) · Huazhong Yang (Tsinghua University, Tsinghua University) · Yu Wang (Tsinghua University, Tsinghua University)
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang (Princeton University) · Difan Liu (Adobe Research) · Yan Kang (None) · Yijun Li (Adobe Research) · Zhe Lin (Adobe Research) · Niraj Jha (Princeton University) · Yuchen Liu (None)
Deep Generative Model based Rate-Distortion for Image Downscaling Assessment
yuanbang liang (Cardiff Univeristy) · Bhavesh Garg (IIT Bombay) · Paul L. Rosin (Cardiff University) · Yipeng Qin (Cardiff University)
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Chun Feng (University of Science and Technology of China) · Joy Hsu (Stanford University) · Weiyu Liu (Stanford University) · Jiajun Wu (Stanford University)
Forecasting of 3D Whole-body Human Poses with Grasping Objects
yan haitao (None) · Qiongjie Cui (Nanjing University of Science and Technology) · Jiexin Xie (Fudan University) · Shijie Guo (Fudan University)
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
Xiang Li (National University of Singapore) · Qianli Shen (National University of Singapore) · Kenji Kawaguchi (National University of Singapore)
Towards Co-Evaluation of Cameras, HDR, and Algorithms for Industrial-Grade 6DoF Pose Estimation
Agastya Kalra (Google) · Guy Stoppi (Intrinsic) · Dmitrii Marin (Intrinsic) · Vage Taamazyan (Intrinsic) · Aarrushi Shandilya (Intrinsic AI) · Rishav Agarwal (Intrinsic) · Anton Boykov (University of Waterloo) · Aaron Chong (Google) · Michael Stark (Intrinsic)
Correcting Diffusion Generation through Resampling
Yujian Liu (University of California, Santa Barbara) · Yang Zhang (International Business Machines) · Tommi Jaakkola (Massachusetts Institute of Technology) · Shiyu Chang (UC Santa Barbara)
Partial-to-Partial Shape Matching with Geometric Consistency
Viktoria Ehm (Technische Universität München) · Maolin Gao (None) · Paul Roetzer (University of Bonn) · Marvin Eisenberger (Technical University Munich) · Daniel Cremers (Technical University Munich) · Florian Bernard (University of Bonn)
Deep Imbalanced Regression via Hierarchical Classification Adjustment
Haipeng Xiong (National University of Singapore) · Angela Yao (National University of Singapore)
Text-guided Explorable Image Super-resolution
Kanchana Vaishnavi Gandikota (None) · Paramanand Chandramouli (Universität Siegen)
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
Kaiwen Zhang (Tsinghua University) · Yifan Zhou (Nanyang Technological University) · Xudong XU (Shanghai AI Laboratory) · Bo Dai (Shanghai AI Laboratory) · Xingang Pan (None)
Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis
Mingyang Zhao (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Jiang Jingen (Shandong University) · Lei Ma (Peking University) · Shiqing Xin (Shandong University) · Gaofeng Meng (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Dong-Ming Yan (Institute of Automation, Chinese Academy of Sciences)
LAENeRF: Local Appearance Editing for Neural Radiance Fields
Lukas Radl (Graz University of Technology) · Michael Steiner (Technische Universität Graz) · Andreas Kurz (Technische Universität Graz) · Markus Steinberger (Technische Universität Graz)
Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting
Haiwei Chen (USC-ICT, Vision and Graphics Lab) · Yajie Zhao (University of Southern California)
Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation
Philipp Schröppel (University of Freiburg, Germany) · Christopher Wewer (Max Planck Institute for Informatics, Saarland Informatics Campus) · Jan Lenssen (Saarland Informatics Campus, Max-Planck Institute) · Eddy Ilg (None) · Thomas Brox (University of Freiburg)
Exploiting Style Latent Flows for Generalizing Video Deepfake Detection
Jongwook Choi (Chung-Ang University) · Taehoon Kim (Chung-Ang University) · Yonghyun Jeong (NAVER) · Seungryul Baek (UNIST) · Jongwon Choi (Chung-Ang University)
Bayesian Differentiable Physics for Cloth Digitalization
Deshan Gong (University of Leeds) · Ningtao Mao (University of Leeds) · He Wang (None)
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Zeyinzi Jiang (Alibaba Group) · Chaojie Mao (Alibaba Group) · Yulin Pan (Alibaba Group, China) · Zhen Han (Alibaba Group) · Jingfeng Zhang (Alibaba Group)
DREAM: Diffusion Rectification and Estimation-Adaptive Models
Jinxin Zhou (Ohio State University, Columbus) · Tianyu Ding (Microsoft) · Tianyi Chen (Microsoft) · Jiachen Jiang (Ohio State University, Columbus) · Ilya Zharkov (Microsoft) · Zhihui Zhu (Ohio State University, Columbus) · Luming Liang (Microsoft)
Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling
Leon Sick (Ulm University) · Dominik Engel (Ulm University) · Pedro Hermosilla (Technische Universität Wien) · Timo Ropinski (Ulm University)
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace (University of Trento) · Aliaksandr Siarohin (Snap Inc.) · Ivan Skorokhodov (KAUST) · Ekaterina Deyneka (Snap Inc.) · Tsai-Shien Chen (University of California, Merced) · Anil Kag (Snap Inc.) · Yuwei Fang (Snap Inc.) · Aleksei Stoliar (None) · Elisa Ricci (University of Trento) · Jian Ren (Snap Inc.) · Sergey Tulyakov (Snap Inc.)
URHand: Universal Relightable Hands
Zhaoxi Chen (Nanyang Technological University) · Gyeongsik Moon (None) · Kaiwen Guo (Google) · Chen Cao (Facebook) · Stanislav Pidhorskyi (Meta) · Tomas Simon (Meta) · Rohan Joshi (Facebook) · Yuan Dong (Facebook) · Yichen Xu (Meta platforms inc) · Bernardo Pires (Meta Platforms Inc.) · He Wen (Meta Platformts, Inc.) · Lucas Evans (Meta) · Bo Peng (Meta Platforms Inc.) · Julia Buffalini (Meta) · Autumn Trimble (Meta) · Kevyn McPhail (Meta) · Melissa Schoeller (Meta Platforms Inc) · Shoou-I Yu (Reality Labs Research, Meta) · Javier Romero (None) · Michael Zollhoefer (Meta) · Yaser Sheikh (Meta) · Ziwei Liu (Nanyang Technological University) · Shunsuke Saito (Reality Labs Research)
Enhancing Visual Continual Learning with Language-Guided Supervision
Bolin Ni (Institute of Automation, Chinese Academy of Sciences) · Hongbo Zhao (Institute of Automation, Chinese Academy of Sciences) · Chenghao Zhang (Alibaba Group) · Ke Hu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Gaofeng Meng (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Shiming Xiang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Generating Human Motion in 3D Scenes from Text Descriptions
Zhi Cen (Zhejiang University) · Huaijin Pi (Zhejiang University) · Sida Peng (None) · Zehong Shen (Zhejiang University) · Minghui Yang (Ant Group) · Shuai Zhu (Ant Group) · Hujun Bao (Zhejiang University) · Xiaowei Zhou (None)
Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge
Haoxiang Ma (Beihang University) · Modi Shi (Beijing University of Aeronautics and Astronautics) · Boyang GAO (Geometry Robotics ltd. & Harbin Institute of Technology) · Di Huang (Beihang University)
Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment
Aobo Li (Xidian University) · Jinjian Wu (Xidian University) · Yongxu Liu (Xidian University) · Leida Li (Xidian University)
Grid Diffusion Models for Text-to-Video Generation
Taegyeong Lee (Ulsan National Institute of Science and Technology) · Soyeong Kwon (Ulsan National Institute of Science and Technology) · Taehwan Kim (UNIST)
Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Devikalyan Das (Max Planck Institute for Informatics) · Christopher Wewer (Max Planck Institute for Informatics, Saarland Informatics Campus) · Raza Yunus (Saarland Informatics Campus, Max-Planck Institute) · Eddy Ilg (None) · Jan Lenssen (Saarland Informatics Campus, Max-Planck Institute)
Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham (None) · Matthew Fisher (Adobe Research) · James Hays (Georgia Institute of Technology) · Nicholas Kolkin (Adobe Systems) · Yuchen Liu (None) · Richard Zhang (Adobe Systems) · Tobias Hinz (Adobe Systems)
Making Vision Transformers Truly Shift-Equivariant
Renan A. Rojas-Gomez (University of Illinois at Urbana Champaign) · Teck-Yian Lim (DSO National Laboratories) · Minh Do (University of Illinois at Urbana-Champaign) · Raymond A. Yeh (Purdue University)
RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses
bedrettin cetinkaya (Middle East Technical University) · Sinan Kalkan (Middle East Technical University) · Emre Akbas (METU)
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi (Independent Researcher)
Learning Triangular Distribution in Visual World
Ping Chen (MicroBT Inc.) · Xingpeng Zhang (Southwest Petroleum University) · Chengtao Zhou (Microbt) · dichao Fan (MicroBT) · Peng Tu (RuqiMobility Inc.) · Le Zhang (shenzhen MicroBT Electronics Technology Corporation ) · Yanlin Qian (Tampere University)
Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng (University of Oxford) · Andrea Vedaldi (University of Oxford)
Generalized Event Cameras
Varun Sundar (University of Wisconsin, Madison) · Matthew Dutson (University of Wisconsin, Madison) · Andrei Ardelean (NovoViz) · Claudio Bruschini (EPFL - EPF Lausanne) · Edoardo Charbon (EPFL - EPF Lausanne) · Mohit Gupta (Department of Computer Sciences, University of Wisconsin - Madison)
Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
Yuang Ai (Institute of Automation, Chinese Academy of Sciences) · Huaibo Huang (Institute of Automation, Chinese Academy of Sciences) · Xiaoqiang Zhou (University of Science and Technology of China) · Jiexiang Wang (University of Science and Technology of China) · Ran He (None)
DIEM: Decomposition-Integration Enhancing Multimodal Insights
Xinyi Jiang (None) · Guoming Wang (Zhejiang University) · Junhao Guo (Zhejiang University) · Juncheng Li (Zhejiang University) · Wenqiao Zhang (National University of Singapore) · Rongxing Lu (University of New Brunswick) · Siliang Tang (Zhejiang University)
Balancing Act: Distribution-Guided Debiasing in Diffusion Models
Rishubh Parihar (Indian Institute of Science, Bangalore) · Abhijnya Bhat (Indian Institute of Science, Indian institute of science, Bangalore) · Abhipsa Basu (Indian Institute of Science) · Saswat Mallick (Indian Institute of Science, Indian institute of science, Bangalore) · Jogendra Kundu Kundu (None) · R. Venkatesh Babu (Indian Institute of Science)
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Zhen Zhao (East China Normal University) · Jingqun Tang (Bytedance) · Chunhui Lin (Bytedance) · Binghong Wu (Bytedance) · Can Huang (Bytedance) · Hao Liu (Bytedance Inc.) · Xin Tan (East China Normal University) · Zhizhong Zhang (East China Normal University) · Yuan Xie (East China Normal University)
NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation
Vikas Thamizharasan (University of Massachusetts Amherst) · Difan Liu (Adobe Research) · Matthew Fisher (Adobe Research) · Nanxuan Zhao (Adobe Research) · Evangelos Kalogerakis (UMass Amherst) · Michal Lukáč (Adobe Systems)
Learning to Count without Annotations
Lukas Knobel (University of Amsterdam & TNO) · Tengda Han (University of Oxford, University of Oxford) · Yuki Asano (University of Amsterdam)
Towards Backward-Compatible Continual Learning of Image Compression
Zhihao Duan (Purdue University) · Ming Lu (Nanjing University) · Justin Yang (Purdue University) · Jiangpeng He (Purdue University) · Zhan Ma (Nanjing University) · Fengqing Zhu (Purdue University, Purdue University)
Learning to Segment Referred Objects from Narrated Egocentric Videos
Yuhan Shen (Northeastern University) · Huiyu Wang (Facebook) · Xitong Yang (Meta) · Matt Feiszli (Meta AI) · Ehsan Elhamifar (None) · Lorenzo Torresani (Facebook) · Effrosyni Mavroudi ()
3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
Songchun Zhang (Zhejiang University) · Yibo Zhang (Jilin University) · Quan Zheng (Institute of Software, Chinese Academy of Sciences) · Rui Ma (Jilin University) · Wei Hua (Zhejiang Lab) · Hujun Bao (Zhejiang University) · Weiwei Xu (Zhejiang University) · Changqing Zou (Zhejiang University)
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou (Huazhong University of Science and Technology) · Shiming Chen (Carnegie Mellon University) · Shuhuang Chen (Huazhong University of Science and Technology) · Ziming Hong (The University of Sydney) · Yan Wang (Alibaba Group) · Xuetao Feng (Alibaba Group) · Salman Khan (Mohamed bin Zayed University of Artificial Intelligence) · Fahad Shahbaz Khan (MBZUAI; Linköping University) · Xinge You (Huazhong University of Science and Technology)
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
Haoyu Ma (University of California, Irvine) · Shahin Mahdizadehaghdam (Meta) · Bichen Wu (Facebook) · Zhipeng Fan (Facebook) · Yuchao Gu (None) · Wenliang Zhao (Meta Inc) · Lior Shapira (Meta) · Xiaohui Xie (University of California, Irvine)
Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment
Jiyuan Zhang (Peking University) · Shiyan Chen (Peking University) · Yajing Zheng (Peking University) · Zhaofei Yu (Peking University) · Tiejun Huang (Peking University)
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang (Nankai University) · Peng-Tao Jiang (vivo Mobile Communication (Hangzhou) Co., Ltd.) · Qibin Hou (Nankai University) · Hao Zhang (vivo Mobile Communication (Hangzhou)Co., Ltd) · Jinwei Chen (vivo Mobile Communication Co., Ltd.) · Bo Li (vivo Mobile Communication Co.,Ltd.)
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
Jianhao Zeng (Tianjin University) · Dan Song (Tianjin University) · Weizhi Nie (Tianjin University) · Hongshuo Tian (Tianjin University) · Tongtong Wang (Tencent LightSpeed Studio) · An-An Liu (Tianjin University)
Making Visual Sense of Oracle Bones for You and Me
Runqi Qiao (Beijing University of Posts and Telecommunications) · LAN YANG (Beijing University of Posts and Telecommunications) · Kaiyue Pang (SketchX AI) · Honggang Zhang (Beijing University of Posts and Telecommunications)
Binarized Low-light Raw Video Enhancement
Gengchen Zhang (Beijing Institute of Technology) · Yulun Zhang (Shanghai Jiao Tong University) · Xin Yuan (Westlake University) · Ying Fu (None)
Coherent Temporal Synthesis for Incremental Action Segmentation
GUODONG DING (NATIONAL UNIVERSITY OF SINGAPORE) · Hans Golong (National University of Singapore) · Angela Yao (National University of Singapore)
Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding
Sai Wang (Wuhan University) · Yutian Lin (Wuhan University) · Yu Wu (Wuhan University)
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures
Zhangyang Xiong () · Chenghong Li (The Chinese University of Hong Kong, Shenzhen) · Kenkun Liu (The Chinese University of Hong Kong (Shenzhen)) · Hongjie Liao (Chinese University of Hong Kong, Shenzhen) · Jianqiao HU (The Chinese University of Hong Kong, Shenzhen) · Junyi Zhu (The Chinese University of Hongkong, Shenzhen) · Shuliang Ning (The Chinese University of HongKong, ShenZhen) · Lingteng Qiu (None) · Chongjie Wang (The Chinese University of Hong Kong ,Shenzhen) · Shijie Wang (The Chinese University of Hong Kong, Shenzhen) · Shuguang Cui (The Chinese University of Hong Kong, Shenzhen) · Xiaoguang Han (The Chinese University of Hong Kong, Shenzhen)
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
Weihuang Liu (University of Macau) · Xi Shen (Tencent AI Lab) · Haolun Li (University of Macau) · Xiuli Bi (Chongqing University of Posts and Telecommunications) · Bo Liu (Chongqing University of Posts and Telecommunications) · Chi-Man Pun (University of Macau) · Xiaodong Cun (Tencent AI Lab)
Communication-Efficient Federated Learning with Accelerated Client Gradient
Geeho Kim (Seoul National University) · Jinkyu Kim (Seoul National University) · Bohyung Han (Seoul National University)
Real-time 3D-aware Portrait Video Relighting
Ziqi Cai (Chinese Academy of Sciences & Beijing Jiaotong University) · Kaiwen Jiang (None) · Shu-Yu Chen (Chinese Academy of Sciences) · Yu-Kun Lai (Cardiff University) · Hongbo Fu (City University of Hong Kong) · Boxin Shi (Peking University) · Lin Gao (University of Chinese Academy of Sciences)
Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
Huajian Huang (The Hong Kong University of Science and Technology) · Longwei Li (SUN YAT-SEN UNIVERSITY) · Hui Cheng (SUN YAT-SEN UNIVERSITY) · Sai-Kit Yeung (The Hong Kong University of Science and Technology (HKUST))
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang (East China University of Science and Technology) · Mengping Yang (East China University of Science and Technology) · Qin Zhou (East China University of Science and Technology) · Zhe Wang (East China University of Science and Technology)
HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
WENCAN CHENG (None) · Hao Tang (ETH Zurich and CMU) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Jong Hwan Ko (Sungkyunkwan University (SKKU))
NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
Chenhao Li (Osaka University) · Taishi Ono (Sony Semiconductor Solutions Europe) · Takeshi Uemori (Sony Semiconductor Solutions Corporation) · Hajime Mihara (Sony Semiconductor Solutions Corporation) · Alexander Gatto (Sony Semiconductor Solutions Europe) · Hajime Nagahara (Osaka University) · Yusuke Moriuchi (Sony Semiconductor Solutions Corporation)
MaskPLAN: Masked Generative Layout Planning from Partial Input
Hang Zhang (ETHZ - ETH Zurich) · Anton Savov (ETHZ - ETH Zurich) · Benjamin Dillenburger (ETHZ - ETH Zurich)
Rapid 3D Model Generation with Intuitive 3D Input
Tianrun Chen (Zhejiang University) · Chaotao Ding (Huzhou university) · Shangzhan Zhang (Zhejiang University) · Chunan Yu (Huzhou University) · Ying Zang (Huzhou University) · Zejian Li (Zhejiang University) · Sida Peng (None) · Lingyun Sun (Zhejiang University)
MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception
Thien-Minh Nguyen (Nanyang Technological University) · Shenghai Yuan (National Technological University) · Thien Nguyen (Nanyang Technological University) · Pengyu Yin (Nanyang Technological University) · Haozhi Cao (Nanyang Technological University) · Lihua Xie (Nanyang Technological University) · Maciej Wozniak (KTH Royal Institute of Technology) · Patric Jensfelt (KTH Royal Institute of Technology, Stockholm, Sweden) · Marko Thiel (Hamburg University of Technology) · Justin Ziegenbein (Hamburg University of Technology) · Noel Blunder (Institute for Technical Logistics - Hamburg University of Technology)
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Stanislaw Szymanowicz (University of Oxford, University of Oxford) · Christian Rupprecht (University of Oxford) · Andrea Vedaldi (University of Oxford)
$CrowdDiff$: Multi-hypothesis Crowd Density Estimation using Diffusion Models
Yasiru Ranasinghe (Johns Hopkins University) · Nithin Gopalakrishnan Nair (Johns Hopkins University) · Wele Gedara Chaminda Bandara (Johns Hopkins University) · Vishal M. Patel (Johns Hopkins University)
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
Yurui Qian (University of Science and Technology of China) · Qi Cai (JD) · Yingwei Pan (HiDream.ai) · Yehao Li (HiDream.ai) · Ting Yao (JD AI Research) · Qibin Sun (University of Science and Technology of China) · Tao Mei (JD Explore Academy)
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
Minh-Tuan Tran (Monash University) · Trung Le (Monash University) · Xuan-May Le (University of Melbourne) · Mehrtash Harandi (Monash University) · Dinh Phung (Monash University)
Unsupervised Salient Instance Detection
Xin Tian (Huawei Technologies Ltd.) · Ke Xu (City University of Hong Kong) · Rynson W.H. Lau (City University of Hong Kong)
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
Kim Youwang (Pohang University of Science and Technology) · Tae-Hyun Oh (None) · Gerard Pons-Moll (University of Tübingen)
L-MAGIC: Language Model Assisted Generation of Images with Consistency
zhipeng cai (Intel Labs) · Matthias Mueller (None) · Reiner Birkl (Intel Corporation) · Diana Wofk (Intel) · Shao-Yen Tseng (Intel) · JunDa Cheng (Huazhong University of Science and Technology) · Gabriela Ben Melech Stan (Intel) · Vasudev Lal (None) · Michael Paulitsch (Intel)
GLACE: Global Local Accelerated Coordinate Encoding
Fangjinhua Wang (None) · Xudong Jiang (ETHZ - ETH Zurich) · Silvano Galliani (Microsoft) · Christoph Vogel (Microsoft) · Marc Pollefeys (ETH Zurich / Microsoft)
HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios
HyunJun Jung (Technische Universität München) · Shun-Cheng Wu (Technical University Munich) · Patrick Ruhkamp (Technical University Munich) · Guangyao Zhai (Technical University of Munich) · Hannah Schieber (Technische Universität München/Friedrich-Alexander Universität Erlangen-Nürnberg) · Giulia Rizzoli (University of Padua) · Pengyuan Wang (Technische Universität München) · Hongcheng Zhao (Technische Universität München) · Lorenzo Garattoni (Toyota Motor Europe) · Sven Meier (Toyota Motor Europe NV/SA) · Daniel Roth (Technische Universität München) · Nassir Navab (TU Munich) · Benjamin Busam (Technical University of Munich)
Customization Assistant for Text-to-image Generation
Yufan Zhou (State University of New York, Buffalo) · Ruiyi Zhang (Adobe Research) · Jiuxiang Gu (Adobe Systems) · Tong Sun (Adobe Systems)
UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures
Mingyuan Zhou (Innopeak Technology) · Rakib Hyder (Oppo, Seattle, USA) · Ziwei Xuan (Innopeak Technology) · Guo-Jun Qi (University of Central Florida)
Event-based Structure-from-Orbit
Ethan Elms (University of Adelaide) · Yasir Latif (The University of Adelaide) · Tae Ha Park (Stanford University) · Tat-Jun Chin (None)
From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior
Jaeho Moon (KAIST) · Juan Luis Gonzalez Bello (KAIST) · Byeongjun Kwon (KAIST) · Munchurl Kim (Korea Advanced Institute of Science and Technology)
Dynamic LiDAR Re-simulation using Compositional Neural Fields
Hanfeng Wu (None) · Xingxing Zuo (Caltech) · Stefan Leutenegger (Department of Informatics, Technische Universität München) · Or Litany (NVIDIA / Technion) · Konrad Schindler (ETH Zurich) · Shengyu Huang (None)
Unsupervised Blind Image Deblurring Based on Self-Enhancement
Lufei Chen (Sichuan University) · Xiangpeng Tian (SiChuan University) · Shuhua Xiong (Sichuan University) · Yinjie Lei (Sichuan University) · Chao Ren (Sichuan University)
SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields
Quentin HERAU (Huawei/University of Burgundy) · Nathan Piasco (Huawei Technologies Ltd.) · Moussab Bennehar (Huawei Noah's Ark Lab) · Luis Guiller,o Roldao Jimenez (Huawei Technologies Ltd.) · Dzmitry Tsishkou (Huawei Technologies Ltd.) · MigniotCyrille (University of Burgundy) · Modélisation Information Systèmes (Université de Picardie Jules-Verne) · Cedric Demonceaux (Université de Bourgogne)
SemCity: Semantic Scene Generation with Triplane Diffusion
Jumin Lee (Korea Advanced Institute of Science and Technology) · Sebin Lee (Korea Advanced Institute of Science and Technology (KAIST)) · Changho Jo (Neosapience) · Woobin Im (Korea Advanced Institute of Science and Technology) · Ju-hyeong Seon (Korea Advanced Institute of Science & Technology) · Sung-Eui Yoon (KAIST)
Label-Efficient Group Robustness via Out-of-Distribution Concept Curation
Yiwei Yang (University of Washington) · Anthony Liu (University of Michigan) · Robert Wolfe (University of Washington) · Aylin Caliskan (University of Washington) · Bill Howe (University of Washington)
StyLitGAN: Image-based Relighting via Latent Control
Anand Bhattad (None) · James Soole (University of Illinois Urbana-Champaign) · David Forsyth (University of Illinois at Urbana-Champaign)
Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability
Jaehui Hwang (Yonsei University) · Junghyuk Lee (Yonsei University) · Jong-Seok Lee (Yonsei University)
ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization
Weiyao Wang (Facebook) · Pierre Gleize (Polytech Nice Sophia) · Hao Tang (Meta Platforms) · Xingyu Chen (Facebook) · Kevin Liang (FAIR at Meta) · Matt Feiszli (Meta AI)
FreeDrag: Feature Dragging for Reliable Point-based Image Editing
Pengyang Ling (University of Science and Technology of China) · Lin Chen (University of Science and Technology of China) · Pan Zhang (Shanghai Artificial Intelligence Laboratory) · Huaian Chen (University of Science and Technology of China) · Yi Jin (University of Science and Technology of China) · Jinjin Zheng (University of Science and Technology of China)
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon (Yonsei University) · Dohyung Kim (Yonsei University) · Jun Yong Cheon (Yonsei University) · Bumsub Ham (Yonsei University)
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Tianyi Xie (University of California, Los Angeles) · Zeshun Zong (University of California, Los Angeles) · Yuxing Qiu (UCLA & LightSpeed Studios) · Xuan Li (None) · Yutao Feng (Zhejiang University) · Yin Yang (University of Utah) · Chenfanfu Jiang (University of California, Los Angeles)
Viewpoint-Aware Visual Grounding in 3D Scenes
Xiangxi Shi (Oregon State University) · Zhonghua Wu (SenseTime) · Stefan Lee (Oregon State University)
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kang chenkang (Huaihai Institute of Technology) · Xiangqian Wu (Harbin Institute of Technology)
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Kun Yuan (Kuaishou Technology) · Hongbo Liu (Tsinghua University) · Mading Li (Kuaishou Technology) · Muyi Sun (Institute of automation, Chinese Academy of Sciences) · Ming Sun (Kuaishou Tech) · Jiachao Gong (Beijing Kuaishou ) · Jinhua Hao (Kuaishou Tech) · Chao Zhou (Peking University) · Yansong Tang (Tsinghua University)
Batch Normalization Alleviates the Spectral Bias in Coordinate Networks
Zhicheng Cai (Nanjing University) · Hao Zhu (Nanjing University) · Qiu Shen (Nanjing University) · Xinran Wang (Nanjing University) · Xun Cao (Nanjing University)
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
Zhe Li (Tsinghua University) · Zerong Zheng (Tsinghua University) · Lizhen Wang (Tsinghua University, Tsinghua University) · Yebin Liu (Tsinghua University)
OMG-Seg: Is One Model Good Enough For All Segmentation?
Xiangtai Li (Nanyang Technological University) · Haobo Yuan (Nanyang Technological University) · Wei Li (Nanyang Technological University) · Henghui Ding (Fudan University) · Size Wu (Nanyang Technological University) · Wenwei Zhang (None) · Yining Li (Shanghai AI Laboratory) · Kai Chen (Shanghai AI Laboratory) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY)
Pose Adapted Shape Learning for Large-Pose Face Reenactment
Gee-Sern Hsu (None) · Jie-Ying Zhang (National Taiwan University of Science and Technology) · Yu-Hsiang Huang (National Taiwan University of Science and Technology) · Wei-Jie Hong (National Taiwan University of Science and Technology)
Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models
Gianni Franchi (ENSTA Paris) · Olivier Laurent (Université Paris-Saclay) · Maxence Leguéry (ENSTA Paris) · Andrei Bursuc (valeo.ai) · Andrea Pilzer (NVIDIA) · Angela Yao (National University of Singapore)
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Luca Barsellotti (University of Modena and Reggio Emilia) · Roberto Amoroso (University of Modena and Reggio Emilia) · Marcella Cornia (University of Modena and Reggio Emilia) · Lorenzo Baraldi (Università degli Studi di Modena e Reggio Emilia) · Rita Cucchiara (Università di Modena e Reggio Emilia)
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
Haiyang Liu (the university of tokyo) · Zihao Zhu (Keio University) · Giorgio Becherini (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · YICHEN PENG (Japan Advanced Institute of Science and Technology, Tokyo Institute of Technology) · Mingyang Su (Tsinghua University, Tsinghua University) · YOU ZHOU (Huawei Technologies Ltd.) · Xuefei Zhe (City University of Hong Kong) · Naoya Iwamoto (Huawei Technologies Japan K.K.) · Bo Zheng (Huawei Technologies Japan) · Michael J. Black (University of Tübingen)
Explaining CLIP's performance disparities on data from blind/low vision users
Daniela Massiceti (Microsoft Research) · Camilla Longden (Microsoft Research, Cambridge) · Agnieszka Słowik (Microsoft) · Samuel Wills (World Bank) · Martin Grayson (Research, Microsoft) · Cecily Morrison (Microsoft Research)
NB-GTR: Narrow-Band Guided Turbulence Removal
Yifei Xia (Peking University) · Chu Zhou (Peking University) · Chengxuan Zhu (Peking University) · Minggui Teng (Peking University) · Chao Xu (Peking University) · Boxin Shi (Peking University)
LaneCPP: Continuous 3D Lane Detection using Physical Priors
Maximilian Pittner (Bosch) · Joel Janai (Robert Bosch GmbH, Bosch) · Alexandru Paul Condurache (None)
Large Language Models are Good Prompt Learners for Low-Shot Image Classification
Zhaoheng Zheng (University of Southern California) · Jingmin Wei (University of Southern California) · Xuefeng Hu (University of Southern California) · Haidong Zhu (University of Southern California) · Ram Nevatia (None)
PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
Yufei Zhang (None) · Jeffrey Kephart (IBM, International Business Machines) · Zijun Cui (University of Southern California) · Qiang Ji (Rensselaer Polytechnic Institute)
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment
yiming ren (None) · xiao han (ShanghaiTech University) · Chengfeng Zhao (ShanghaiTech University) · Jingya Wang (ShanghaiTech University) · Lan Xu (ShanghaiTech University) · Jingyi Yu (Shanghai Tech University) · Yuexin Ma (ShanghaiTech University)
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu (University of Washington) · Otilia Stretcu (Google Research) · Chun-Ta Lu (Google Research) · Krishnamurthy Viswanathan (Google) · Kenji Hata (Google) · Enming Luo (Google) · Ranjay Krishna (University of Washington) · Ariel Fuxman (Google)
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal (University of Missouri) · Aditya Avinash (Google) · Neil Alldrin (Google) · Jan Dlabal (Research, Google) · Wenlei Zhou (Google) · Enming Luo (Google) · Otilia Stretcu (Google Research) · Hao Xiong (Google) · Chun-Ta Lu (Google Research) · Howard Zhou (Google Research) · Ranjay Krishna (University of Washington) · Ariel Fuxman (Google) · Tom Duerig (Google)
FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding
Jinglin Xu (University of Science and Technology Beijing) · Guohao Zhao (Peking University) · Sibo Yin (Peking University) · Wenhao Zhou (University of Science and Technology Beijing) · Yuxin Peng (Peking University)
An Aggregation-Free Federated Learning for Tackling Data Heterogeneity
Yuan Wang (Institute of High Performance Computing, Singapore, A*STAR) · Huazhu Fu (Institute of High Performance Computing, Singapore, A*STAR) · Renuga Kanagavelu (Institute of High Performance Computing, Singapore, A*STAR) · Qingsong Wei (Agency for Science, Technology and Research (A*STAR)) · Yong Liu (Institute of High Performance Computing, Singapore, A*STAR) · Rick Goh (Institute of High Performance Computing, Singapore, A*STAR)
Infrared Adversarial Car Stickers
Xiaopei Zhu (Tsinghua University) · Yuqiu Liu (Beijing Forestry University) · Zhanhao Hu (UC Berkeley) · Jianmin Li (Department of computer science and technology, Tsinghua University) · Xiaolin Hu (Tsinghua University, Tsinghua University)
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina (University of Trento) · Massimiliano Mancini (University of Trento) · Elia Cunegatti (University of Trento) · Gaowen Liu (None) · Giovanni Iacca (University of Trento) · Elisa Ricci (University of Trento)
CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification
Yiyu Chen (Beijing Institute of Technology) · Zheyi Fan (Beijing Institute of Technology) · Zhaoru Chen (Beijing Institute of Technology) · Yixuan Zhu (Beijing Institute of Technology)
CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition
Qixuan Zheng (City University of Hong Kong) · Ming Zhang (Hong Kong Applied Science and Technology Research Institute (ASTRI)) · Hong Yan (City University of Hong Kong)
Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction
Cheng Sun (NVIDIA) · Wei-En Tai (National Tsinghua University) · Yu-Lin Shih (National Tsinghua University) · Kuan-Wei Chen (National Tsinghua University) · Yong-Jing Syu (National Tsinghua University) · Kent Selwyn The (National Tsinghua University) · Yu-Chiang Frank Wang (NVIDIA) · Hwann-Tzong Chen (National Tsing Hua University)
Boosting Adversarial Transferability by Block Shuffle and Rotation
Kunyu Wang (The Chinese University of Hong Kong) · he xuanran (TikTok) · Wenxuan Wang (The Chinese University of Hong Kong) · Xiaosen Wang (Huazhong University of Science and Technology)
Advancing Saliency Ranking with Human Fixations: Dataset, Models and Benchmarks
Bowen Deng (Computer Vision Laboratory University of Nottingham) · Siyang Song (University of Leicester) · Andrew French (University of Nottingham) · Denis Schluppeck (University of Nottingham) · Michael Pound (University of Nottingham)
GALA: Generating Animatable Layered Assets from a Single Scan
Taeksoo Kim (Seoul National University) · Byungjun Kim (Seoul National University) · Shunsuke Saito (Reality Labs Research) · Hanbyul Joo (Seoul National University)
Single Mesh Diffusion Models with Field Latents for Texture Generation
Thomas W. Mitchel (PlayStation) · Carlos Esteves (Google Research) · Ameesh Makadia (Google Research)
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo (CMU, Carnegie Mellon University) · Pedro Morgado (None)
Move Anything with Layered Scene Diffusion
Jiawei Ren (Nanyang Technological University) · Mengmeng Xu (Meta AI) · Jui-Chieh Wu (Meta) · Ziwei Liu (Nanyang Technological University) · Tao Xiang (University of Surrey) · Antoine Toisoul (Meta)
Learning Diffusion Texture Priors for Image Restoration
Tian Ye (Hong Kong University of Science and Technology, Guangzhou Campus) · Sixiang Chen (Hong Kong University of Science and Technology (GZ)) · Wenhao Chai (University of Washington) · Zhaohu Xing (Hong Kong University of Science and Technology) · Jing Qin (Hong Kong Polytechnic University) · Ge lin (Hong Kong University of Science and Technology (Guangzhou)) · Lei Zhu (Hong Kong University of Science and Technology (Guangzhou) & HKUST)
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng (Shenzhen MSU-BIT University) · Xiaoyong Chen (Shenzhen University) · Jiaming Liang (Shenzhen University) · Huisi Wu (Shenzhen University) · Guang-Zhong Cao (Shenzhen University) · Yong Guo (Max-Planck Institute for Informatics)
Implicit Event-RGBD Neural SLAM
Delin Qu (Fudan University) · Chi Yan (Shanghai AI Laboratory) · Dong Wang (Shanghai AI Laboratory) · Jie Yin (Shanghai Jiaotong University) · Qizhi Chen (None) · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology) · Yiting Zhang (Zhejiang University) · Bin Zhao (Northwest Polytechnical University Xi'an) · Xuelong Li (Northwestern Polytechnical University)
Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection
Xiaowei Zhao () · Xianglong Liu (BUAA) · Duorui Wang (Beijing University of Aeronautics and Astronautics) · Yajun Gao (Beihang University) · Zhide Liu (Beijing University of Aeronautics and Astronautics)
RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation
Yi Rong (Nanjing University) · Haoran Zhou (Nanjing University) · Kang Xia (nanjing university) · Cheng Mei (nanjing university) · Jiahao Wang () · Tong Lu (Nanjing University)
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
Jiun Tian Hoe (Nanyang Technological University) · Xudong Jiang (Nanyang Technological University) · Chee Seng Chan (Universiti Malaya) · Yap-peng Tan (Nanyang Technological University) · Weipeng Hu (Nanyang Technological University)
Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation
Wenxiao Deng (Nanjing University) · Wenbin Li (Nanjing University) · Tianyu Ding (Microsoft) · Lei Wang (University of Wollonong) · Hongguang Zhang (Systems Engineering Institute, AMS) · Kuihua Huang (National University of Defense Technology) · Jing Huo (Nanjing University) · Yang Gao (Nanjing University)
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
Junming Chen (Hong Kong University of Science and Technology) · Yunfei Liu (International Digital Economy Academy (IDEA)) · Jianan Wang (None) · Ailing Zeng (IDEA) · Yu Li (International Digital Economy Academy) · Qifeng Chen (Hong Kong University of Science and Technology)
One-step Diffusion with Distribution Matching Distillation
Tianwei Yin (Massachusetts Institute of Technology) · Michaël Gharbi (Massachusetts Institute of Technology) · Richard Zhang (Adobe Systems) · Eli Shechtman (Adobe) · Fredo Durand (Massachusetts Institute of Technology) · William Freeman (MIT and Google) · Taesung Park (Adobe Systems)
On Exact Inversion of DPM-Solvers
Seongmin Hong (Seoul National University) · Kyeonghyun Lee (Seoul National University) · Suh Yoon Jeon (Seoul National University) · Hyewon Bae (Seoul National University) · Se Young Chun (Seoul National University)
Privacy-Preserving Face Recognition Using Trainable Feature Subtraction
Yuxi Mi (Fudan University) · Zhizhou Zhong (Fudan University) · Yuge Huang (Tencent Youtu Lab) · Jiazhen Ji (Tencent Youtu Lab) · Jianqing Xu (HIT) · Jun Wang (None) · ShaoMing Wang (WeChat Pay Lab33) · Shouhong Ding (Tencent Youtu Lab) · Shuigeng Zhou (Fudan University)
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang (University of Michigan) · Ziqiao Ma (University of Michigan) · Xiaofeng Gao (Amazon AGI) · Suhaila Shakiah (Amazon) · Qiaozi Gao (Amazon) · Joyce Chai (University of Michigan)
DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting
Jer Pelhan (Universtiy of Ljubljana) · Alan Lukezic (University of Ljubljana) · Vitjan Zavrtanik (University of Ljubljana) · Matej Kristan (University of Ljubljana)
CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral (Virginia Tech) · Enis Simsar (ETH Zurich) · Federico Tombari (Google, TUM) · Pinar Yanardag (Virginia Polytechnic Institute and State University)
Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization
Insoo Kim (Korea Advanced Institute of Science and Technology) · Jae Seok Choi (Samsung Advanced Institute of Technology (SAIT)) · Geonseok Seo (Samsung) · Kinam Kwon (Samsung) · Jinwoo Shin (Korea Advanced Institute of Science and Technology) · Hyong-Euk Lee (Samsung Advanced Institute of Technology)
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
Lin Li (King's College London) · Haoyan Guan (King's College London, University of London) · Jianing Qiu (Imperial College London) · Michael Spratling (King's College London and University of Luxembourg)
Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model
Wenfeng Song (Beijing Information Science and Technology University) · Xingliang Jin (Beijing information science and technology university) · Shuai Li (Beijing University of Aeronautics and Astronautics) · Chenglizhao Chen (China University of Petroleum) · Aimin Hao (None) · Xia HOU (Beijing Information Science & Technology University) · Ning Li (Beijing Information Science and Technology University) · Hong Qin (Stony Brook University (State University of New York at Stony Brook))
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
Jijie He (Zhejiang Gongshang University) · Wenwu Yang (Zhejiang Gongshang University)
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Argaw Argaw (None) · Seunghyun Yoon (Adobe Research) · Fabian Caba Heilbron (Adobe Research) · Hanieh Deilamsalehy (None) · Trung Bui (Adobe Research) · Zhaowen Wang (Adobe Research) · Franck Dernoncourt (Adobe Systems) · Joon Chung (KAIST)
Neural Refinement for Absolute Pose Regression with Feature Synthesis
Shuai Chen (University of Oxford) · Yash Bhalgat (Visual Geometry Group, University of Oxford) · Xinghui Li (University of Oxford) · Jia-Wang Bian (University of Oxford) · Kejie Li (University of Oxford) · Zirui Wang (University of Oxford) · Victor Adrian Prisacariu (None)
Single-View Scene Point Cloud Human Grasp Generation
Yan-Kang Wang (SUN YAT-SEN UNIVERSITY)) · Chengyi Xing (Stanford University) · Yi-Lin Wei (SUN YAT-SEN UNIVERSITY) · Xiao-Ming Wu (SUN YAT-SEN UNIVERSITY) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY)
UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement
yaofeng xie (Ocean University of China) · Lingwei Kong (Sanya Oceanographic Institution, Ocean University of China) · Kai Chen (Sanya Oceanographic Institution, Ocean University of China) · Zheng Ziqiang (Hong Kong University of Science and Technology) · Xiao Yu (Sanya Oceanographic Institution, Ocean University of China) · Zhibin Yu (Sanya Oceanographic Institution, Ocean university of China) · Bing Zheng (Sanya Oceanographic Institution, Ocean University of China)
GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects
Sungphill Moon (Naver Labs) · Hyeontae Son (Naver Labs) · Dongcheol Hur (NAVER LABS) · Sangwook Kim (Naver Labs)
DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
Xiaoyun Zheng (Peking University Shenzhen Graduate School) · Liwei Liao (Peking University) · Xufeng Li (Cityu) · Jianbo Jiao (University of Birmingham) · Rongjie Wang (PengCheng Laboratory) · Feng Gao (Peking University) · Shiqi Wang (City University of Hong Kong) · Ronggang Wang (Peking University Shenzhen Graduate School)
Improving Depth Completion via Depth Feature Upsampling
Yufei Wang (Northwest Polytechnical University Xi'an) · Ge Zhang (Northwest Polytechnical University Xi'an) · Shaoqian Wang (Northwest Polytechnical University Xi'an) · Bo Li (None) · Qi Liu (Northwest Polytechnical University Xi'an) · Le Hui (Nanjing University Of Science And Technology) · Yuchao Dai (Northwestern Polytechnical University)
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Maitreya Patel (Arizona State University) · Changhoon Kim (Arizona State University) · Sheng Cheng (Arizona State University) · Chitta Baral (Arizona State University) · 'YZ' Yezhou Yang (Arizona State University)
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
Ziyang Chen (University of Michigan) · Israel D. Gebru (Facebook) · Christian Richardt (Meta Reality Labs) · Anurag Kumar (Facebook) · William Laney (Meta) · Andrew Owens (University of Michigan) · Alexander Richard (Reality Labs Research, Meta)
Nearest Is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li (Wuhan University) · Yishuo Cai (Central South University) · Haowei Li (Wuhan University) · Feng Xue (ZJU-Hangzhou Global Scientific and Technological Innovation Center) · Zhifeng Li (Tencent) · Yiming Li (Zhejiang University)
Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket
Chengxu Zuo (Xiamen University) · Yiming Wang (Xiamen University) · Lishuang Zhan (Xiamen University) · Shihui Guo (Xiamen University) · Xinyu Yi (Tsinghua University) · Feng Xu (Tsinghua University, Tsinghua University) · Yipeng Qin (Cardiff University)
Neural Exposure Fusion for High-Dynamic Range Object Detection
Emmanuel Onzon (Torc Robotics) · Maximilian Bömer (Torc Robotics) · Fahim Mannan () · Felix Heide (Department of Computer Science, Princeton University)
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu (National University of Singapore) · Wenjie Wang (National University of Singapore) · Yongqi Li (Hong Kong Polytechnic University) · Hanwang Zhang (Nanyang Technological University) · Liqiang Nie (Harbin Institute of Technology (Shenzhen)) · Tat-seng Chua (National University of Singapore)
Model Adaptation for Time Constrained Embodied Control
Jaehyun Song (Sungkyunkwan University) · Minjong Yoo (Sungkyunkwan University) · Honguk Woo (Sungkyunkwan University)
GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes
Haozhe Lin (None) · Chunyu Wei (Tsinghua University, Tsinghua University) · Li He (None) · Yuchen Guo (Tsinghua University, Tsinghua University) · Yuchy Zhao (Tsinghua University, Tsinghua University) · Shanglong Li (Tsinghua University) · Lu Fang (Tsinghua University, Tsinghua University)
Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
Dongsu Zhang (Seoul National University) · Francis Williams (NVIDIA) · Žan Gojčič (NVIDIA) · Karsten Kreis (NVIDIA) · Sanja Fidler (Department of Computer Science, University of Toronto) · Young Min Kim (Seoul National University) · Amlan Kar (NVIDIA)
Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
Mingqi Jiang (Oregon State University) · Saeed Khorram (Apple) · Li Fuxin (Oregon State University)
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed (Khalifa University of Science and Technology) · Arif Mahmood (Information Technology University, Lahore) · IYYAKUTTI IYAPPAN GANAPATHI (Khalifa University of Science, Technology and Research) · Fayaz Ali (Khalifa University of Science, Technology and Research) · Naoufel Werghi (Khalifa University) · Mohammed Bennamoun (University of Western Australia)
TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation
Hoonhee Cho (KAIST) · Taewoo Kim (KAIST) · Yuhwan Jeong (KAIST) · Kuk-Jin Yoon (KAIST)
One-Prompt to Segment All Medical Images
Wu (None) · Min Xu (Carnegie Mellon University)
Quantifying Task Priority for Multi-Task Optimization
Wooseong Jeong (KAIST) · Kuk-Jin Yoon (KAIST)
Image Sculpting: Precise Object Editing with 3D Geometry Control
Jiraphon Yenphraphai (New York University) · Xichen Pan (New York University) · Sainan Liu (Intel) · Daniele Panozzo (New York University) · Saining Xie (Facebook)
UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence
Ruihai Wu (Peking University) · Haoran Lu (Peking University) · Yiyan Wang (Beijing Institute of Technology) · Yubo Wang (Peking University) · Hao Dong (None)
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu (Shanghai Jiao Tong University) · Yikun Liu (Shanghai Jiaotong University) · Ferenas (None) · Chen Ju () · Ya Zhang (Shanghai Jiao Tong University) · Yanfeng Wang (Shanghai Jiao Tong University)
Distilling Semantic Priors from SAM to Efficient Image Restoration Models
Quan Zhang (Tsinghua University, Tsinghua University) · Xiaoyu Liu (University of Science and Technology of China) · Wei Li (Huawei Noah's Ark Lab) · Hanting Chen (Huawei Technologies Ltd.) · Junchao Liu (Huawei Noah's Ark Lab) · Jie Hu (Huawei Technologies Ltd.) · Zhiwei Xiong (None) · Chun Yuan (Tsinghua University, Tsinghua University) · Yunhe Wang (Huawei Noah's Ark Lab)
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Haiwen Diao (Dalian University of Technology) · Bo Wan (KU Leuven) · Ying Zhang (Tencent) · Xu Jia (Dalian University of Technology) · Huchuan Lu (Dalian University of Technology) · Long Chen (HKUST)
Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
AJ Piergiovanni (Google) · Isaac Noble (Google) · Dahun Kim (Google) · Michael Ryoo (Stony Brook University) · Victor Gomes (Google) · Anelia Angelova (Google)
Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods
Mengyu Dai (Florida State University) · Amir Hossein Raffiee (SalesForce.com) · Aashish Jain (Salesforce) · Joshua Correa (SalesForce.com)
CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention
Mohammad Sadil Khan (University of Luxembourg) · Elona Dupont (SnT, University of Luxemburg) · Sk Aziz Ali (DFKI GmbH) · Kseniya Cherenkova (University of Luxemburg) · Anis Kacem (University of Luxemburg) · Djamila Aouada (SnT, University of Luxembourg)
Instance Tracking in 3D Scenes from Egocentric Videos
Yunhan Zhao (University of California, Irvine) · Haoyu Ma (University of California, Irvine) · Shu Kong (University of Macau, Texas A&M University) · Charless Fowlkes (University of California, Irvine)
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Shoukang Hu (Nanyang Technological University) · Tao Hu (Nanyang Technological University) · Ziwei Liu (Nanyang Technological University)
Cyclic Learning for Binaural Audio Generation and Localization
Zhaojian Li (Northwest Polytechnical University) · Bin Zhao (Northwest Polytechnical University Xi'an) · Yuan Yuan (Northwest Polytechnical University Xi'an)
Frequency-aware Event-based Video Deblurring for Real-World Motion Blur
Taewoo Kim (KAIST) · Hoonhee Cho (KAIST) · Kuk-Jin Yoon (KAIST)
Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
Yiming Li (New York University) · Zhiheng Li (New York University) · Nuo Chen (New York University) · Moonjun Gong (New York University) · Zonglin Lyu (New York University) · Zehong Wang (New York University) · Peili Jiang (New York University) · Chen Feng (New York University)
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
Chengxiang Fan (Zhejiang University) · Muzhi Zhu (Zhejiang University) · Hao Chen (Zhejiang University) · Yang Liu (Zhejiang University) · Weijia Wu (None) · Huaqi Zhang (Hangzhou VIVO Information Technology Co., Ltd) · Chunhua Shen (Zhejiang University)
Data Poisoning based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang (University of California, Los Angeles (UCLA)) · Hongbin Liu (Duke University) · Jinyuan Jia (Pennsylvania State University) · Neil Zhenqiang Gong (Duke University)
Video Interpolation with Diffusion Models
Siddhant Jain (Google Research) · Daniel Watson (Google DeepMind) · Aleksander Holynski (UC Berkeley & Google Research) · Eric Tabellion (Google) · Ben Poole (Google) · Janne Kontkanen (Research, Google)
DeMatch: Deep Decomposition of Motion Field for Two-View Correspondence Learning
Shihua Zhang (Wuhan University) · Zizhuo Li (Wuhan University) · Yuan Gao (Wuhan University) · Jiayi Ma (Wuhan University)
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu (None) · Runyu He (Nanjing University) · Gangshan Wu (Nanjing University) · Limin Wang (Nanjing University)
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo (None) · Nian Liu (Mohamed bin Zayed University of Artificial Intelligence) · Wangbo Zhao (National University of Singapore) · Xuguang Yang (Northwestern Polytechnical University Xi'an) · Dingwen Zhang (Northwestern Polytechnical University) · Deng-Ping Fan (ETH Zurich) · Fahad Shahbaz Khan (MBZUAI; Linköping University) · Junwei Han (Northwestern Polytechnical University, Tsinghua University)
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Zhiwu Qing (Huazhong University of Science and Technology, Tsinghua University) · Shiwei Zhang (Alibaba Group) · Jiayu Wang (None) · Xiang Wang (Huazhong University of Science and Technology) · Yujie Wei (Fudan University) · Yingya Zhang (Alibaba Group) · Changxin Gao (Huazhong University of Science and Technology) · Nong Sang (Huazhong University of Science and Technology)
Self-supervised debiasing using low rank regularization
Geon Yeong Park (Korea Advanced Institute of Science and Technology) · Chanyong Jung (Korea Advanced Institute of Science and Technology) · Sangmin Lee (Korea Advanced Institute of Science & Technology) · Jong Chul Ye (Korea Advanced Institute of Science and Technology) · Sang Wan Lee (Korea Advanced Institute of Science & Technology)
Neural Markov Random Field for Stereo Matching
Tongfan Guan (The Chinese University of Hong Kong) · Chen Wang (University at Buffalo) · Yun-Hui Liu (The Chinese University of Hong Kong)
Ungeneralizable Examples
Jingwen Ye (National University of Singapore) · Xinchao Wang (National University of Singapore)
Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency
Xu Yingjie (None) · Bangzhen Liu (South China University of Technology) · Hao Tang (School of Computer Science and Engineering, Nanjing University of Science and Technology) · Bailin Deng (Cardiff University) · Shengfeng He (Singapore Management University)
Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs
Sunghwan Hong (Korea University) · Jaewoo Jung (Korea University) · Heeseong Shin (Korea University) · Jiaolong Yang (Microsoft Research) · Chong Luo (Microsoft Research Asia) · Seungryong Kim (Korea University)
ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments
Jingyu Zhang (Fudan University) · Kun Yang (Fudan University) · Yilei Wang (Fudan University) · Hanqi Wang (Fudan University) · Peng Sun (Duke Kunshan University) · Liang Song (Fudan University)
SPAD: Spatially Aware Multiview Diffusers
Yash Kant (University of Toronto / Snap Research) · Aliaksandr Siarohin (Snap Inc.) · Ziyi Wu (University of Toronto) · Michael Vasilkovsky (Snap Inc.) · Guocheng Qian (KAUST) · Jian Ren (Snap Inc.) · Riza Alp Guler (Snap Inc.) · Bernard Ghanem (KAUST) · Sergey Tulyakov (Snap Inc.) · Igor Gilitschenski (University of Toronto)
Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan (Nanjing University of Science and Technology) · Yuankai Lin (Huazhong University of Science and Technology) · Kun Wang (Nanjing University of Science and Technology) · Yupeng Zheng (Institute of Automation,Chinese Academy of Sciences) · Yufei Wang (Northwest Polytechnical University Xi'an) · Zhenyu Zhang (None) · Jun Li (Nanjing University of Science and Technology) · Jian Yang (Nanjing University of Science and Technology)
Text-to-3D Generation with Bidirectional Diffusion using both 3D and 2D priors
Lihe Ding (The Chinese University of Hong Kong) · Shaocong Dong (Hong Kong University of Science and Technology) · Zhanpeng Huang (SenseTime Research) · Zibin Wang (Sensetime Group Limited) · Yiyuan Zhang (The Chinese University of Hong Kong) · Kaixiong Gong (None) · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology) · Tianfan Xue (The Chinese University of Hong Kong)
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu (Google Deepmind) · Kelvin C.K. Chan (Google DeepMind) · Yu-Chuan Su (Google) · Wenhu Chen (University of Waterloo) · Yandong Li (Google Research) · Kihyuk Sohn (Google) · Yang Zhao (Google) · Xue Ben (Google) · William Cohen (Google DeepMind) · Ming-Wei Chang (Google) · Xuhui Jia (Google)
Beyond Average: Individualized Visual Scanpath Prediction
Xianyu Chen (University of Minnesota) · Ming Jiang (University of Minnesota, Minneapolis) · Qi Zhao (University of Minnesota, Minneapolis)
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
Woo-Jin Ahn (Korea University) · Geun-Yeong Yang (Korea University) · Hyunduck Choi (Chonnam National University) · Myo-Taeg Lim (Korea University)
DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach
Dayi Tan (Tongji university) · Hansheng Chen (Stanford University) · Wei Tian (Tongji University) · Lu Xiong (Tongji University)
Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
Takashi Otonari (The University of Tokyo) · Satoshi Ikehata (NII, Tokyo Institute of Technology) · Kiyoharu Aizawa (The University of Tokyo)
Test-Time Domain Generalization for Face Anti-Spoofing
Qianyu Zhou (Shanghai Jiao Tong University) · Ke-Yue Zhang (Tencent) · Taiping Yao (Tencent Youtu Lab) · Xuequan Lu (La Trobe University) · Shouhong Ding (Tencent Youtu Lab) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation
Jin Wang (China University of Petroleum) · Bingfeng Zhang (China University of Petroleum (East China)) · Jian Pang (China University of Petroleum (East China)) · Honglong Chen (China University of Petroleum) · Weifeng Liu (China University of Petroleum (East China))
Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos
Leonhard Sommer (University of Freiburg, Albert-Ludwigs-Universität Freiburg) · Artur Jesslen (University of Freiburg) · Eddy Ilg (None) · Adam Kortylewski (University of Freiburg & MPI-INF)
Fooling Polarization-based Vision using Locally Controllable Polarizing Projection
Zhuoxiao Li (The Univerisity of Tokyo) · Zhihang Zhong (Shanghai AI Lab) · Shohei Nobuhara (Kyoto Institute of Technology) · Ko Nishino (Kyoto University) · Yinqiang Zheng (None)
Affine Equivariant Networks Based on Differential Invariants
Yikang Li (Peking University) · Yeqing Qiu (The Chinese Univeristy of Hong Kong, Shenzhen) · Yuxuan Chen (Peking University) · Lingshen He (Peking University) · Zhouchen Lin (Peking University)
C3: High-performance and low-complexity neural compression from a single image or video
Hyunjik Kim (DeepMind) · Matthias Bauer (Google DeepMind) · Lucas Theis (Google) · Jonathan Richard Schwarz (Harvard University) · Emilien Dupont (Google DeepMind)
AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing
Fan Yang (None) · Tianyi Chen (Nanyang Technological University) · XIAOSHENG HE (Nanyang Technological University) · Zhongang Cai (Nanyang Technological University) · Lei Yang (The Chinese University of Hong Kong) · Si Wu (South China University of Technology) · Guosheng Lin (Nanyang Technological University)
ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining
Ruoxi Shi (University of California, San Diego) · Xinyue Wei (University of California, San Diego) · Cheng Wang (University of California, San Diego) · Hao Su (UCSD)
Monocular Identity-Conditioned Facial Reflectance Reconstruction
Xingyu Ren (Shanghai Jiao Tong University) · Jiankang Deng (Imperial College London & Huawei UKRD) · Yuhao Cheng (Shanghai Jiaotong University) · Jia Guo (InsightFace.AI) · Chao Ma (Shanghai Jiao Tong University) · Yichao Yan (Shanghai Jiao Tong University) · Wenhan Zhu (None) · Xiaokang Yang (Shanghai Jiao Tong University, China)
3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Zhiyin Qian (Department of Computer Science, ETHZ - ETH Zurich) · Shaofei Wang (None) · Marko Mihajlovic (Swiss Federal Institute of Technology) · Andreas Geiger (University of Tübingen) · Siyu Tang (ETH Zurich)
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
Haofeng Liu (South China Normal University) · Chenshu Xu (Singapore Management University) · Yifei Yang (Singapore Management University) · Lihua Zeng (South China Normal University) · Shengfeng He (Singapore Management University)
SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
Tao Wang (Beijing University of Posts and Telecommunications) · Lei Jin (Beijing University of Posts and Telecommunications) · Zheng Wang (Wuhan University) · Jianshu Li (Ant Group) · Liang Li (None) · Fang Zhao (Tencent AI Lab) · Yu Cheng (National University of Singapore) · Li Yuan (Peking University) · Li ZHOU (Wuhan University) · Junliang Xing (Tsinghua University) · Jian Zhao ()
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
Gianluca Scarpellini (Università degli Studi di Genova, Istituto Italiano di Tecnologia) · Stefano Fiorini (Istituto Italiano di Tecnologia) · Francesco Giuliari (Istituto Italiano di Tecnologia) · Pietro Morerio (Istituto Italiano di Tecnologia) · Alessio Del Bue (Istituto Italiano di Tecnologia (IIT))
MS-DETR: Efficient DETR Training with Mixed Supervision
Chuyang Zhao (Baidu) · Yifan Sun (Baidu Research) · Wenhao Wang (University of Technology Sydney) · Qiang Chen (Baidu) · Errui Ding (Baidu Inc.) · Yi Yang (Zhejiang University) · Jingdong Wang (Baidu)
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization
Xu Peng (Xiamen University) · Junwei Zhu (Tencent Youtu Lab) · Boyuan Jiang (Tencent Youtu Lab) · Ying Tai (Nanjing University) · Donghao Luo (Tencent YouTu Lab) · Jiangning Zhang (Tencent Youtu Lab) · Wei Lin (Xiamen University) · Taisong Jin (Xiamen University) · Chengjie Wang (Tencent Youtu Lab; Shanghai Jiao Tong University) · Rongrong Ji (Xiamen University)
HDQMF: Holographic Feature Decomposition Using Quantum Algorithms
Prathyush Poduval (University of California, Irvine) · Zhuowen Zou (University of California, Irvine) · Mohsen Imani (University of California, Irvine)
Fair-VPT: Fair Visual Prompt Tuning for Image Classification
Sungho Park (Yonsei university) · Hyeran Byun (Yonsei University)
Task-conditioned adaptation of visual features in multi-task policy learning
Pierre Marza (Institut National des Sciences Appliquées de Lyon) · Laetitia Matignon (LIRIS, CNRS) · Olivier Simonin (INSA de Lyon) · Christian Wolf (Naver Labs Europe)
Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers
Jinxia Xie (Guangxi Normal University) · Bineng Zhong (Guangxi Normal University) · Zhiyi Mo (Wuzhou university) · Shengping Zhang (Harbin Institute of Technology) · Liangtao Shi (Guangxi Normal University) · Shuxiang Song (Guangxi Normal University) · Rongrong Ji (Xiamen University)
Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
Jaewoo Jeong (KAIST) · Daehee Park (KAIST) · Kuk-Jin Yoon (KAIST)
Revisiting Single Image Reflection Removal In the Wild
Yurui Zhu (University of Science and Technology of China) · Bo Li (vivo Mobile Communication Co.,Ltd.) · Xueyang Fu (University of Science and Technology of China) · Peng-Tao Jiang (vivo Mobile Communication (Hangzhou) Co., Ltd.) · Hao Zhang (vivo Mobile Communication (Hangzhou)Co., Ltd) · Qibin Sun (University of Science and Technology of China) · Zheng-Jun Zha (University of Science and Technology of China) · Jinwei Chen (vivo Mobile Communication Co., Ltd.)
Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction
Zhenzhong Kuang (Hangzhou Dianzi University) · Xiaochen Yang (Hangzhou Dianzi University) · Yingjie Shen (Hangzhou Dianzi University) · Chao Hu (Hangzhou Dianzi University) · Jun Yu (Hangzhou Dianzi University)
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
Yi Yu (Southeast University) · Xue Yang (Shanghai AI Laboratory) · Qingyun Li (Harbin Institute of Technology) · Feipeng Da (Southeast University) · Jifeng Dai (Tsinghua University, Tsinghua University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Junchi Yan (Shanghai Jiao Tong University)
TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations
Bo Sun (University of Texas, Austin) · Thibault Groueix (Adobe Systems) · Chen Song (University of Texas at Austin) · Qixing Huang (University of Texas at Austin) · Noam Aigerman (Université de Montréal)
Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes
Hmrishav Bandyopadhyay (University of Surrey) · Subhadeep Koley (University of Surrey) · Ayan Das (University of Surrey) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Aneeshan Sain (University of Surrey) · Pinaki Nath Chowdhury (University of Surrey) · Tao Xiang (University of Surrey) · Yi-Zhe Song (None)
Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence
Ripon Saha (Arizona State University) · Dehao Qin (Clemson University) · Nianyi Li (None) · Jinwei Ye (None) · Suren Jayasuriya (Arizona State University)
IIRP-Net: Iterative Inference Residual Pyramid Network for Enhanced Image Registration
Tai Ma (East China Normal University) · zhangsuwei (East China Normal University) · Jiafeng Li (East China Normal University) · Ying Wen (East China Normal University)
Frequency-Adaptive Dilated Convolution for Semantic Segmentation
Linwei Chen (Beijing Institute of Technology) · Lin Gu (RIKEN / the University of Tokyo) · Dezhi Zheng (None) · Ying Fu (None)
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue (, Institute of automation, Chinese academy of science) · Jie Cheng (State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA) · Longteng Guo (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xingyuan Dai (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Zijia Zhao (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xingjian He (, Institute of automation, Chinese academy of science) · Gang Xiong (Institute of Automation, Chinese Academy of Science) · Yisheng Lv (Institute of Automation, Chinese Academy of Science) · Jing Liu (Institute of automation, Chinese academy of science)
Style Aligned Image Generation via Shared Attention
Amir Hertz (Tel Aviv University) · Andrey Voynov (Google Research) · Shlomi Fruchter (Research, Google) · Daniel Cohen-Or (Google)
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Yuchao Gu (None) · Yipin Zhou (Facebook) · Bichen Wu (Facebook) · Licheng Yu (None) · Jia-Wei Liu (National University of Singapore) · Rui Zhao (None) · Jay Zhangjie Wu (National University of Singapore) · David Junhao Zhang (National University of Singapore) · Mike Zheng Shou (National University of Singapore) · Kevin Tang (Meta)
Steganographic Passport: An Owner and User Verifiable Credential for Deep Model IP Protection Without Retraining
Qi Cui (Nanyang Technological University) · Ruohan Meng (Nanyang Technological University) · Chaohui Xu (Nanyang Technological University) · Chip Hong Chang (Nanyang Technological University)
SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation
Jiehong Lin (South China University of Technology) · lihua liu (South China University of Technology) · Dekun Lu (South China University of Technology) · Kui Jia (South China University of Technology)
MatSynth: A Modern PBR Materials Dataset
Giuseppe Vecchio (University of Catania) · Valentin Deschaintre (Adobe Research)
$MonoDiff$: Monocular 3D Object Detection and Pose Estimation with Diffusion Models
Yasiru Ranasinghe (Johns Hopkins University) · Deepti Hegde (Johns Hopkins University) · Vishal M. Patel (Johns Hopkins University)
Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization
Yujia Liu (School of Computer Science, Peking University, Beijing, China) · Chenxi Yang (Peking University) · Dingquan Li (Peng Cheng Laboratory) · Jianhao Ding (Peking University) · Tingting Jiang (Peking University)
BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
Fengyuan Shi (Nanjing University) · Jiaxi Gu (Huawei Noah‘s Ark Lab) · Hang Xu (Huawei Noah‘s Ark Lab) · Songcen Xu (Huawei Noah's Ark Lab) · Wei Zhang (Huawei Technologies Ltd.) · Limin Wang (Nanjing University)
Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring
Huicong Zhang (Harbin Institute of Technology) · Haozhe Xie (Nanyang Technological University) · Hongxun Yao (Harbin Institute of Technology)
Bi-Causal: Group Activity Recognition via Bidirectional Causality
Youliang Zhang (Wuhan University) · Wenxuan Liu (Wuhan University of Technology) · danni xu (National University of Singapore) · Zhuo Zhou (Wuhan University) · Zheng Wang (Wuhan University)
PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees
Chulin Xie (University of Illinois, Urbana Champaign) · De-An Huang (NVIDIA) · Wenda Chu (California Institute of Technology) · Daguang Xu (NVIDIA) · Chaowei Xiao (Arizona State University) · Bo Li (UIUC) · Anima Anandkumar (California Institute of Technology)
How to Train Neural Field Representations: A Comprehensive Study and Benchmark
Samuele Papa (University of Amsterdam) · Riccardo Valperga (University of Amsterdam) · David Knigge (University of Amsterdam) · Miltiadis Kofinas (University of Amsterdam) · Phillip Lippe (University of Amsterdam) · Jan-Jakob Sonke (Netherlands Cancer Institute) · Efstratios Gavves ()
Universal Semi-Supervised Domain Adaptation by Mitigating Common-Class Bias
Wenyu Zhang (Institute for Infocomm Research, A*STAR) · Qingmu Liu (National University of Singapore) · Felix Ong (National University of Singapore) · Mohamed Ragab (Institute for Infocomm Research , A*STAR) · Chuan-Sheng Foo (Centre for Frontier AI Research, A*STAR)
Semantic-Aware Multi-Label Adversarial Attacks
Hassan Mahmood (Northeastern University) · Ehsan Elhamifar (None)
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan (The Chinese University of Hong Kong, Shenzhen) · Jinke Ren (The Chinese University of Hong Kong, Shenzhen) · Chun-Mei Feng (None) · Hengshuang Zhao (The University of Hong Kong) · Shuguang Cui (The Chinese University of Hong Kong, Shenzhen) · Zhen Li (The Chinese University of Hong Kong, Shenzhen)
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
Chengxu Liu (Xi'an Jiaotong University) · Xuan Wang (Megvii Technology Inc.) · Xiangyu Xu (Xi'an Jiaotong University) · Ruhao Tian (Xi'an Jiaotong University) · Shuai Li (Megvii Technology Inc.) · Xueming Qian (Xi'an Jiaotong University, Tsinghua University) · Ming-Hsuan Yang (University of California at Merced)
The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement
Gabriele Trivigno (None) · Carlo Masone (Politecnico di Torino) · Barbara Caputo (Politecnico di Torino) · Torsten Sattler (Czech Technical University in Prague)
PointInfinity: Resolution-Invariant Point Diffusion Models
Zixuan Huang (University of Illinois Urbana-Champaign) · Justin Johnson (University of Michigan) · Shoubhik Debnath (FAIR, Meta) · James Rehg (None) · Chao-Yuan Wu (Meta)
F$^3$Loc: Fusion and Filtering for Floorplan Localization
Changan Chen (None) · Rui Wang (Microsoft) · Christoph Vogel (Microsoft) · Marc Pollefeys (ETH Zurich / Microsoft)
ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association
Shuxiao Ding (Mercedes-Benz AG & University of Bonn) · Lukas Schneider (Mercedes Benz Research & Development) · Marius Cordts (Mercedes-Benz) · Jürgen Gall (University of Bonn)
Construct to Associate: Cooperative Context Learning for Domain Adaptive Point Cloud Segmentation
Guangrui Li (University of Technology Sydney)
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
Gabriele Berton (None) · Alex Stoken (University of Texas at Austin) · Barbara Caputo (Politecnico di Torino) · Carlo Masone (Politecnico di Torino)
Relation Rectification in Diffusion Model
Yinwei Wu (National University of Singapore) · Xingyi Yang (National University of Singapore) · Xinchao Wang (National University of Singapore)
Close Imitation of Expert Retouching for Black-and-White Photography
Seunghyun Shin (GIST) · Jisu Shin (None) · Jihwan Bae (CHA University, School of Medicine) · Inwook Shim (Inha University) · Hae-Gon Jeon (GIST)
AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search
Junghyup Lee (Yonsei University) · Bumsub Ham (Yonsei University)
Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training
Di Ming (Chongqing University of Technology) · Peng Ren (Chongqing University of Technology) · Yunlong Wang (IQVIA) · Xin Feng (Chongqing University of Technology)
Accelerating Diffusion Sampling with Optimized Time Steps
Shuchen Xue (Academy of Mathematics and Systems Science, Chinese Academy of Sciences) · Zhaoqiang Liu (University of Electronic Science and Technology of China) · Fei Chen (Huawei Noah's Ark Lab) · Shifeng Zhang (Huawei Technologies Ltd.) · Tianyang Hu (Huawei Noah's Ark Lab) · Enze Xie (Huawei Noah's Ark Lab) · Zhenguo Li (Huawei)
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
Hancheng Ye (Fudan University) · Chong Yu (Fudan University NVIDIA Corporation) · Peng Ye (Fudan University) · Renqiu Xia (Shanghai Jiao Tong University) · Bo Zhang (Shanghai AI Laboratory) · Yansong Tang (Tsinghua University) · Jiwen Lu (Tsinghua University) · Tao Chen (Fudan University)
OneFormer3D: One Transformer for Unified Point Cloud Segmentation
Maksim Kolodiazhnyi (Samsung) · Anna Vorontsova (Samsung) · Anton Konushin (Samsung) · Danila Rukhovich (Samsung Research)
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
Junbo Yin (Beijing Institute of Technology) · Wenguan Wang (Zhejiang University) · Runnan Chen (None) · Wei Li (Inceptio) · Ruigang Yang (Inceptio ) · Pascal Frossard (EPFL) · Jianbing Shen (University of Macau)
NC-TTT: A Noise Constrastive Approach for Test-Time Training
David OSOWIECHI (École de Technologie Supérieure, ETS Montreal) · Gustavo Vargas Hakim (École de technologie supérieure, Université du Québec) · Mehrdad Noori (École de technologie supérieure, Université du Québec) · Milad Cheraghalikhani (École de technologie supérieure, Université du Québec) · Ali Bahri (École de technologie supérieure, Université du Québec) · Moslem Yazdanpanah (École de technologie supérieure, Université du Québec) · Ismail Ben Ayed (ETS Montreal) · Christian Desrosiers (École de technologie supérieure)
One-Shot Structure-Aware Stylized Image Synthesis
Hansam Cho (Korea University) · Jonghyun Lee (Korea University) · Seunggyu Chang (NAVER Cloud) · Yonghyun Jeong (NAVER)
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Haokai Pang (ETH Zurich) · Heming Zhu (Max Planck Institute for Informatics, Saarland Informatics Campus) · Adam Kortylewski (University of Freiburg & MPI-INF) · Christian Theobalt (MPI Informatik) · Marc Habermann (Saarland Informatics Campus, Max-Planck Institute)
Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
Kai Xu (National University of Singapore) · Ziwei Yu (None) · Xin Wang (Huawei Technologies Ltd.) · Michael Bi Mi (Huawei Technologies Ltd.) · Angela Yao (National University of Singapore)
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh (Massachusetts Institute of Technology) · Chih-Wei Wu (Netflix) · Iroro Orife (Netflix) · Mahdi Kalayeh (Netflix)
Grounded Text-to-Image Synthesis with Attention Refocusing
Quynh Phung (University of Maryland, College Park) · Songwei Ge (University of Maryland, College Park) · Jia-Bin Huang (University of Maryland, College Park)
TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
Pavlo Melnyk (Computer Vision Laboratory, Linköping University) · Andreas Robinson (Linköping University) · Michael Felsberg (Linköping University) · Mårten Wadenbäck (Linköping University)
RTracker: Recoverable Tracking via PN Tree Structured Memory
Yuqing Huang (Harbin Institute of Technology) · Xin Li (Peng Cheng Laboratory) · Zikun Zhou (Peng Cheng Laboratory) · Yaowei Wang (Pengcheng Laboratory) · Zhenyu He (Harbin Institute of Technology, Shenzhen) · Ming-Hsuan Yang (University of California at Merced)
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren (None) · Linli Yao (Peking University) · Shicheng Li (Peking University) · Xu Sun (Peking University) · Lu Hou (Huawei Technologies Ltd.)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen (Nanjing University) · Jiannan Wu (University of Hong Kong) · Wenhai Wang (Shanghai AI Laboratory) · Weijie Su (University of Science and Technology of China) · Guo Chen (Nanjing University) · Sen Xing (Tsinghua University, Tsinghua University) · Zhong Muyan (Tsinghua University, Tsinghua University) · Qing-Long Zhang (Shanghai Artificial Intelligence Laboratory) · Xizhou Zhu (Shanghai AI Laboratory) · Lewei Lu (SenseTime) · Bin Li (University of Science and Technology of China) · Ping Luo (The University of Hong Kong) · Tong Lu (Nanjing University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Jifeng Dai (Tsinghua University, Tsinghua University)
Tyche: Stochastic in Context Learning for Medical Image Segmentation
Marianne Rakic (Massachusetts Institute of Technology) · Hallee Wong (MIT) · Jose Javier Gonzalez Ortiz (DataBricks) · Beth Cimini (Broad Institute) · John Guttag (Massachusetts Institute of Technology) · Adrian V. Dalca (Harvard University)
Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
Eunsu Baek (Seoul National University) · Keondo Park (Seoul National University) · Ji-yoon Kim (Seoul National University) · Hyung-Sin Kim (Seoul National University)
CLOAF: CoLlisiOn-Aware Human Flow
Andrey Davydov (EPFL) · Martin Engilberge (EPFL - EPF Lausanne) · Mathieu Salzmann (EPFL) · Pascal Fua (Swiss Federal Institute of Technology Lausanne)
What, How, and When Should Object Detectors Update in Continually Changing Test Domains?
Jayeon Yoo (Seoul National University) · Dongkwan Lee (Seoul National University) · Inseop Chung (Seoul National University) · Donghyun Kim (Korea University) · Nojun Kwak (Seoul National University)
Learning Correlation Structures for Vision Transformers
Manjin Kim (POSTECH) · Paul Hongsuck Seo (Google) · Cordelia Schmid (Inria / Google) · Minsu Cho (POSTECH)
CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning
Lianggangxu Chen (East China Normal University) · Xuejiao Wang (East China Normal University) · Jiale Lu (East China Normal University) · Shaohui Lin (East China Normal University) · Changbo Wang (East China Normal University) · Gaoqi He (East China Normal University)
Equivariant plug-and-play image reconstruction
Matthieu Terris (INRIA) · Thomas Moreau (INRIA) · Nelly Pustelnik (CNRS) · Julián Tachella (CNRS)
Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Julie Tores (Université Côte d'Azur) · Lucile Sassatelli (Universite Cote d'Azur) · Hui-Yin Wu (Inria at Université Côte d'Azur) · Clement Bergman (INRIA) · Léa Andolfi (CELSA-Sorbonne) · Victor Ecrement (Sorbonne Université) · Frederic Precioso (Universite Cote d'Azur) · Thierry Devars (Université Paris-Sorbonne (Paris IV)) · Magali GUARESI (CNRS) · Virginie Julliard (Université Paris-Sorbonne (Paris IV)) · Sarah Lécossais (Sorbonne Paris Nord)
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu (University of Science and Technology of China) · Kun Yin (Tencent YouTu Lab) · Haoyu Cao (Tencent Youtu Lab) · Xinghua Jiang (None) · Xin Li (Tencent Youtu Lab) · Yinsong Liu (Tencent Youtu Lab) · Deqiang Jiang (Tencent YouTu Lab) · Xing Sun (Tencent YouTu Lab) · Linli Xu (University of Science and Technology of China)
Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning
Hao Jiang (Peking University) · Bingfeng Zhou (Peking University) · Yadong Mu (Peking University)
Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow
Hanyu Zhou (Huazhong University of Science and Technology) · Yi Chang (Huazhong University of Science and Technology) · Zhiwei Shi (Huazhong University of Science and Technology)
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation
Zijia Lu (Northeastern University) · Ehsan Elhamifar (None)
Patch2Self2: Self-supervised Denoising on Coresets via Matrix Sketching
Shreyas Fadnavis (Johnson and Johnson) · Agniva Chowdhury (Oak Ridge National Laboratory) · Joshua Batson (Anthropic) · Petros Drineas (Purdue University) · Eleftherios Garyfallidis (Indiana University)
Optimizing Diffusion Noise Can Serve As Universal Motion Priors
Korrawe Karunratanakul (ETH Zurich) · Konpat Preechakul (University of California, Berkeley) · Emre Aksan (Google) · Thabo Beeler (Google) · Supasorn Suwajanakorn (Vidyasirimedhi Institute of Science and Technology) · Siyu Tang (ETH Zurich)
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei (University of Michigan - Ann Arbor) · Zixuan Pan (University of Michigan - Ann Arbor) · Andrew Owens (University of Michigan)
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
Hao Li (Xiamen University) · Ying Chen (Xiamen University) · Yifei Chen (Huawei) · Rongshan Yu (National University of Singapore) · Wenxian Yang (Aginome Scientific) · Liansheng Wang (Xiamen University, Tsinghua University) · Bowen Ding (Shanghai Jiaotong University) · Yuchen Han (Shanghai Jiaotong University)
Generative Unlearning for Any Identity
Juwon Seo (Kyung Hee University) · Sung-Hoon Lee (Kyung Hee University) · Tae-Young Lee (Kyung Hee University) · SeungJun Moon (KLleon) · Gyeong-Moon Park (Kyung Hee University)
Enhancing Multimodal Cooperation via Sample-level Modality Valuation
Yake Wei (Renmin University of China) · Ruoxuan Feng (Renmin University of China) · Zihe Wang (Renmin University of China) · Di Hu (Renmin University of China)
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation
Xiongwei Wu (HyperGAI) · Sicheng Yu (Bytedance) · Ee-Peng Lim (Singapore Management University) · Chong Wah Ngo (Singapore Management University)
CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection
Haonan Zhang (Xi'an Jiaotong University) · Longjun Liu (Xi'an Jiaotong University) · Yuqi Huang (Xi'an Jiaotong University) · YangZhao (Xi'an Jiaotong University) · Xinyu Lei (Xi'an Jiaotong University) · Bihan Wen (Nanyang Technological University)
SfmCAD: Unsupervised CAD Reconstruction by Learning Sketch-based Feature Modeling Operations
Pu Li (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Jianwei Guo (Institute of Automation, Chinese Academy of Sciences) · HUIBIN LI (Institute of Automation, Chinese Academy of Sciences) · Bedrich Benes (Purdue University) · Dong-Ming Yan (Institute of Automation, Chinese Academy of Sciences)
Neural Redshift: Random Networks are not Random Functions
Damien Teney (Idiap Research Institute) · Armand Nicolicioiu (ETHZ - ETH Zurich) · Valentin Hartmann (EPFL) · Ehsan Abbasnejad (University of Adelaide)
SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective
Yu-Bang Zheng (Southwest Jiaotong University) · Xile Zhao (University of Electronic Science and Technology of China) · Junhua Zeng (RIKEN) · Chao Li (RIKEN) · Qibin Zhao (RIKEN) · Heng-Chao Li (Southwest Jiaotong University) · Ting-Zhu Huang (University of Electronic Science and Technology of China)
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
Ting Lei (Peking University) · Shaofeng Yin (Peking University) · Yang Liu (Peking University)
Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration
Shihao Zhou (Nankai University) · Duosheng Chen (Nankai University) · Jinshan Pan (Nanjing University of Science and Technology) · Jinglei Shi (Nankai University) · Jufeng Yang (None)
DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation
Yifei Li (Massachusetts Institute of Technology) · Hsiaoyu Chen (Meta) · Egor Larionov (Meta) · Nikolaos Sarafianos (Meta Reality Labs) · Wojciech Matusik (Massachusetts Institute of Technology) · Tuur Stuyck (Meta)
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
Guan Wang (Peking University) · Zhimin Li (Tencent Data Platform) · Qingchao Chen (Peking University) · Yang Liu (Peking University)
On the Estimation of Image-matching Uncertainty in Visual Place Recognition
Mubariz Zaffar (Delft University of Technology) · Liangliang Nan (Delft University of Technology) · Julian F. P. Kooij (Delft University of Technology)
Learning to Transform Dynamically for Better Adversarial Transferability
Rongyi Zhu (None) · Zeliang Zhang (University of Rochester) · Susan Liang (University of Rochester) · Zhuo Liu (University of Rochester) · Chenliang Xu (University of Rochester)
3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surfaces
Linyi Jin (None) · Nilesh Kulkarni (None) · David Fouhey (New York University)
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
Yuelin Zhang (The Chinese University of Hong Kong) · Pengyu Zheng (The Chinese University of Hong Kong) · Wanquan Yan (The Chinese University of Hong Kong) · Chengyu Fang (Tsinghua University, Tsinghua University) · Shing Shin Cheng (The Chinese University of Hong Kong)
Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships
Rangel Daroya (University of Massachusetts Amherst) · Aaron Sun (University of Massachusetts Amherst) · Subhransu Maji (University of Massachusetts, Amherst)
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
Cansu Korkmaz (Koc University) · Ahmet Murat Tekalp (Koç University) · Zafer Dogan (Koc University)
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
Gang Zhang (Tsinghua University) · Chen Junnan (Huazhong University of Science and Technology) · Guohuan Gao (Beijing Institute of Technology) · Jianmin Li (Department of computer science and technology, Tsinghua University) · Si Liu (Beihang University) · Xiaolin Hu (Tsinghua University, Tsinghua University)
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception
Ruiyang Hao (Institute for AI Industry Research, Tsinghua University) · Siqi Fan (Institute for AI Industry Research, Tsinghua University) · Yingru Dai (Tsinghua University, Tsinghua University) · Zhenlin Zhang (China Automotive Innovation Corporation) · Chenxi Li (CAIC) · YuntianWang (China Automotive Innovation Corporation) · Haibao Yu (University of Hong Kong) · Wenxian Yang (Tsinghua University, Tsinghua University) · Jirui Yuan (Tsinghua University, Tsinghua University) · Zaiqing Nie (Tsinghua University, Tsinghua University)
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
Le Xue (None) · Ning Yu (Salesforce Research) · Shu Zhang (SalesForce.com) · Artemis Panagopoulou (University of Pennsylvania) · Junnan Li (None) · Roberto Martín-Martín (University of Texas at Austin) · Jiajun Wu (Stanford University) · Caiming Xiong (Salesforce Research) · Ran Xu (SalesForce.com) · Juan Carlos Niebles (Salesforce Research) · Silvio Savarese (Salesforce)
CFAT: Unleashing Triangular Windows for Image Super-resolution
Abhisek Ray (Indian Institute of Technology, Patna) · Gaurav Kumar (Indian Institute of Technology (IIT), Patna) · Maheshkumar Kolekar (Indian Institute of Technology, Patna)
Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing
Yafei Zhang (Kunmimg University of Science and Technology) · Shen Zhou (Kunmimg University of Science and Technology) · Huafeng Li (Kunmimg University of Science and Technology)
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Zhongcong Xu (national university of singaore, National University of Singapore) · Jianfeng Zhang (NUS) · Jun Hao Liew (ByteDance) · Hanshu Yan (ByteDance) · Jia-Wei Liu (National University of Singapore) · Chenxu Zhang (Bytedance) · Jiashi Feng (ByteDance) · Mike Zheng Shou (National University of Singapore)
Relaxed Contrastive Learning for Federated Learning
Seonguk Seo (Seoul National University) · Jinkyu Kim (Seoul National University) · Geeho Kim (Seoul National University) · Bohyung Han (Seoul National University)
LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry
Weirong Chen (Technical University of Munich) · Le Chen (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Rui Wang (Microsoft) · Marc Pollefeys (ETH Zurich / Microsoft)
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
Tianhao Qi (University of Science and Technology of China) · Shancheng Fang (University of Science and Technology of China) · Yanze Wu (ByteDance Inc.) · Hongtao Xie (University of Science and Technology of China) · Jiawei Liu (ByteDance Inc.) · Lang chen (ByteDance) · Qian HE (Institute of Remote Sensing Application, Chinese Academic of Sciences) · Yongdong Zhang (University of Science and Technology of China)
CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data
Wei Fang (Alibaba Group) · Yuxing Tang (Alibaba Group) · Heng Guo (Alibaba Group) · Mingze Yuan (Peking University) · Tony C. W. MOK (Alibaba DAMO Academy) · Ke Yan (Alibaba DAMO Academy) · Jiawen Yao (Alibaba Group) · Xin Chen (Guangzhou First People's Hospital) · Zaiyi Liu (Guangdong General Hospital) · Le Lu (Alibaba Group) · Ling Zhang (Alibaba Group) · Minfeng Xu (Alibaba Group)
Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples
Yuyang Yu (South China University of Technology) · Bangzhen Liu (South China University of Technology) · Chenxi Zheng (None) · Xuemiao Xu (South China University of Technology) · Huaidong Zhang (South China University of Technology) · Shengfeng He (Singapore Management University)
Can’t make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
Himangi Mittal (Carnegie Mellon University) · Nakul Agarwal (Honda Research Institute USA) · Shao-Yuan Lo (Johns Hopkins University) · Kwonjoon Lee (Honda Research Institute USA)
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video
Hongchi Xia (Shanghai Jiaotong University) · Chih-Hao Lin (None) · Wei-Chiu Ma (Cornell University) · Shenlong Wang (University of Illinois, Urbana Champaign)
Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering
Jiawei Yao (University of Washington) · Qi Qian (Alibaba Group) · Juhua Hu (University of Washington)
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
Junyan Wang (University of New South Wales) · Zhenhong Sun (University of New South Wales) · Stewart Tan (Alibaba DAMO Academy) · Xuanbai Chen (Carnegie Mellon University) · Weihua Chen (Alibaba Group) · li (None) · Cheng Zhang (Carnegie Mellon University) · Yang Song (University of New South Wales)
Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation
Hanyang Chi (None) · Jian Pang (China University of Petroleum (East China)) · Bingfeng Zhang (China University of Petroleum (East China)) · Weifeng Liu (China University of Petroleum (East China))
Targeted Representation Alignment for Open-World Semi-Supervised Learning
Ruixuan Xiao (Zhejiang University) · Lei Feng (Nanyang Technological University) · Kai Tang (Zhejiang University) · Junbo Zhao (Zhejiang University) · Yixuan Li (University of Wisconsin Madison) · Gang Chen (College of Computer Science and Technology, Zhejiang University) · Haobo Wang (Zhejiang University)
Contrasting intra-modal and ranking cross-modal hard negatives to enhance visio-linguistic compositional understanding
Le Zhang (Mila-Quebec AI Institute) · Rabiul Awal (None) · Aishwarya Agrawal (None)
Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
Yijun Yang (None) · Hongtao Wu (The Hong Kong University of Science and Technology (Guangzhou)) · Angelica I. Aviles-Rivero (University of Cambridge) · Yulun Zhang (Shanghai Jiao Tong University) · Jing Qin (Hong Kong Polytechnic University) · Lei Zhu (Hong Kong University of Science and Technology (Guangzhou) & HKUST)
FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding
Jun Xiang (University of Science and Technology of China) · Xuan Gao (University of Science and Technology of China) · Yudong Guo (Image Derivative Inc) · Juyong Zhang (University of Science and Technology of China)
DAP: A Dynamic Adversarial Patch for Evading Person Detectors
Amira Guesmi (New York University, Abu Dhabi) · Ruitian Ding (New York University) · Muhammad Abdullah Hanif (New York University, Abu Dhabi) · Ihsen Alouani (The Queen's University Belfast) · Muhammad Shafique (New York University Abu Dhabi)
T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory
Daehee Park (KAIST) · Jaeseok Jeong (KAIST) · Sung-Hoon Yoon (KAIST) · Jaewoo Jeong (KAIST) · Kuk-Jin Yoon (KAIST)
Dynamic Support Information Mining for Category-Agnostic Pose Estimation
Pengfei Ren (Beijing University of Posts and Telecommunications) · Yuanyuan Gao (Beijing University of Posts and Telecommunications) · Haifeng Sun (Beijing University of Posts and Telecommunications) · Qi Qi (Beijing University of Posts and Telecommunications) · Jingyu Wang (Beijing University of Post and Telecommunication, Tsinghua University) · Jianxin Liao (Beijing University of Posts and Telecommunications)
Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching
Matteo Bastico (Mines Paris - PSL) · Etienne Decencière (Mines Paris) · Laurent Corté (Mines ParisTech) · Yannick TILLIER (Mines Paris - PSL) · David Ryckelynck (Mines Paris PSL University)
Orthogonal Adaptation for Modular Customization of Diffusion Models
Ryan Po (Stanford University) · Guandao Yang (None) · Kfir Aberman (Google) · Gordon Wetzstein (Stanford University)
MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation
Zhicheng Zhang (Nankai University) · Pancheng Zhao (Nankai University) · Eunil Park (Sungkyunkwan University) · Jufeng Yang (None)
Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning
Yun Li (CSIRO's Data61) · Zhe Liu (Tiktok) · Hang Chen (Snap Inc.) · Lina Yao (CSIRO's Data61 and University of New South Wales)
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding
Lorenzo Bianchi (CNR-ISTI) · Fabio Carrara (CNR-ISTI) · Nicola Messina (Institute of Information Science and Technologies - National Research Council (ISTI-CNR)) · Claudio Gennaro (CNR) · Fabrizio Falchi (CNR)
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak (Korea University) · Erqun Dong (University of British Columbia) · Yuhe Jin (University of British Columbia) · Hanseok Ko (Korea University) · Shweta Mahajan (University of British Columbia) · Kwang Moo Yi (University Of British Columbia)
CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning
Hyuck Lee (Korea Advanced Institute of Science and Technology) · Heeyoung Kim (Korea Advanced Institute of Science and Technology)
NeRF Director: Revisiting View Selection in Neural Volume Rendering
Wenhui Xiao (Queensland University of Technology) · Rodrigo Santa Cruz (CSIRO) · David Ahmedt-Aristizabal (CSIRO) · Olivier Salvado (CSIRO) · Clinton Fookes (Queensland University of Technology) · Leo Lebrat (CSIRO / QUT)
Map-Relative Pose Regression for Visual Re-Localization
Shuai Chen (University of Oxford) · Tommaso Cavallari (Niantic Inc.) · Victor Adrian Prisacariu (None) · Eric Brachmann (None)
MANUS: Markerless Grasp Capture using Articulated 3D Gaussians
Chandradeep Pokhariya (International Institute of Information Technology, Hyderabad, International Institute of Information Technology Hyderabad) · Ishaan Shah (International Institute of Information Technology, Hyderabad, International Institute of Information Technology Hyderabad) · Angela Xing (Brown University) · Zekun Li (Tencent AI Lab) · Kefan Chen (Brown University) · Avinash Sharma (International Institute of Information Technology Hyderabad) · Srinath Sridhar (None)
Towards Generalizable Tumor Synthesis
Qi Chen (University of Science and Technology of China) · Xiaoxi Chen (None) · Haorui Song (Johns Hopkins University) · Alan L. Yuille (Johns Hopkins University) · Zhiwei Xiong (None) · Chen Wei (Johns Hopkins University) · Zongwei Zhou (Johns Hopkins University)
Diversified and Personalized Multi-rater Medical Image Segmentation
Yicheng Wu (Monash University) · Xiangde Luo (University of Electronic Science and Technology of China) · Zhe Xu (The Chinese University of Hong Kong; Harvard Medical School) · Xiaoqing Guo (University of Oxford, University of Oxford) · Lie Ju (Monash University) · Zongyuan Ge (Monash University) · Wenjun Liao (University of Electronic Science and Technology of China) · Jianfei Cai (Monash University)
ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image
Marco Pesavento (University of Surrey) · Yuanlu Xu (Meta Reality Labs Research) · Nikolaos Sarafianos (Meta Reality Labs) · Robert Maier (Meta) · Ziyan Wang (Carnegie Mellon University) · Chun-Han Yao (University of California at Merced) · Marco Volino (University of Surrey) · Edmond Boyer (INRIA Grenoble Rhône-Alpes) · Adrian Hilton (University of Surrey) · Tony Tung (Meta)
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya (Stony Brook University) · Anurag Arnab (Google) · Arsha Nagrani (Google ) · Michael Ryoo (Stony Brook University)
Point Transformer V3: Simpler, Faster, Stronger
Xiaoyang Wu (The University of Hong Kong) · Li Jiang (Max Planck Institute for Informatics) · Peng-Shuai Wang (Peking University) · Zhijian Liu (Massachusetts Institute of Technology) · Xihui Liu (The University of Hong Kong) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Wanli Ouyang (University of Sydney) · Tong He (Shanghai AI Lab) · Hengshuang Zhao (The University of Hong Kong)
Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion
Yujie Xue (HNU) · Ruihui Li (Hunan University) · F anWu (Wuhan University) · Zhuo Tang (Hunan University) · Kenli Li (Hunan University) · Duan Mingxing (Hunan University)
FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences
Haobo Xu (None) · Jun Zhou (Shanghai Jiaotong University) · Hua Yang (Shanghai Jiaotong University) · Renjie Pan (Shanghai Jiaotong University) · Cunyan Li (Shanghai Jiaotong University)
Gradient-based Parameter Selection for Efficient Fine-Tuning
Zhi Zhang (Institute for Logic, Language and Computation, University of Amsterdam) · Qizhe Zhang (Peking University) · Zijun Gao (Shandong University) · Renrui Zhang (MMLab of CUHK & Shanghai AI Laboratory) · Ekaterina Shutova (University of Amsterdam) · Shiji Zhou (Tsinghua University, Tsinghua University) · Shanghang Zhang (Peking University)
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Xiaohan Ding (Tencent AI Lab) · Yiyuan Zhang (The Chinese University of Hong Kong) · Yixiao Ge (Tencent) · Sijie Zhao (Tencent AI Lab) · Lin Song (Tencent AI Lab) · Xiangyu Yue (The Chinese University of Hong Kong) · Ying Shan (Tencent)
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Yiran Qin (The Chinese University of Hong Kong(Shenzhen)) · Enshen Zhou (Shanghai AI Laboratory) · Qichang Liu (None) · Zhenfei Yin (University of Sydney) · Lu Sheng (Beihang University) · Ruimao Zhang (The Chinese University of Hong Kong (Shenzhen)) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Jing Shao (Shanghai AI Laboratory)
Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements
Niccolò Biondi (University of Florence, Italy) · Federico Pernici (University of Florence, Italy) · Simone Ricci (University of Florence, Italy) · Alberto Del Bimbo (Università degli Studi di Firenze)
A General and Efficient Training for Transformer via Token Expansion
Wenxuan Huang (East China Normal University) · Yunhang Shen (Tencent) · Jiao Xie (Xiamen University) · Baochang Zhang (Beihang University) · Gaoqi He (East China Normal University) · Ke Li (Tencent) · Xing Sun (Tencent YouTu Lab) · Shaohui Lin (East China Normal University)
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li (Tencent Youtu Lab) · Yunfei Wu (Tencent YouTu Lab) · Xinghua Jiang (None) · ZhiHao Guo (Tencent YOUTU Lab) · Mingming Gong (Tencent YouTu Lab) · Haoyu Cao (Tencent Youtu Lab) · Yinsong Liu (Tencent Youtu Lab) · Deqiang Jiang (Tencent YouTu Lab) · Xing Sun (Tencent YouTu Lab)
Language-Driven Anchors for Zero-Shot Adversarial Robustness
Xiao Li (Tsinghua University) · Wei Zhang (Department of Computer Science and Technology, Tsinghua University) · Yining Liu (Harbin Institute of Technology at Weihai) · Zhanhao Hu (UC Berkeley) · Bo Zhang (Tsinghua University, Tsinghua University) · Xiaolin Hu (Tsinghua University, Tsinghua University)
Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian (Google) · Lijie Fan (Massachusetts Institute of Technology) · Kaifeng Chen (Google) · Dina Katabi (Massachusetts Institute of Technology) · Dilip Krishnan (Google) · Phillip Isola (None)
MotionEditor: Editing Video Motion via Content-Aware Diffusion
Shuyuan Tu (Fudan University) · Qi Dai (Microsoft Research Asia) · Zhi-Qi Cheng (Carnegie Mellon University) · Han Hu (Microsft Research Asia) · Xintong Han (Huya Inc) · Zuxuan Wu (Fudan University) · Yu-Gang Jiang (Fudan University)
EVS-assisted joint Deblurring, Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
Rui Jiang (OMNIVISION) · Fangwen Tu (OMNIVISION) · Yixuan Long (OMNIVISION) · Aabhaas Vaish (OMNIVISION) · Bowen Zhou (OMNIVISION) · Qinyi Wang (OmniVision) · Wei Zhang (OMNIVISION) · Yuntan Fang (OMNIVISION) · Luis Eduardo García Capel (OMNIVISION) · Bo Mu (OMNIVISION) · Tiejun Dai (OMNIVISION) · Andreas Suess (OMNIVISION)
Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano (Institute of Photogrammetry and Robotics, University of Bonn (Germany)) · Federico Magistri (Rheinische Friedrich-Wilhelms Universität Bonn) · Lucas Nunes (University of Bonn) · Jens Behley (University of Bonn) · Cyrill Stachniss (University of Bonn)
MindBridge: A Cross-Subject Brain Decoding Framework
Shizun Wang (National University of Singapore) · Songhua Liu (None) · Zhenxiong Tan (National University of Singapore) · Xinchao Wang (National University of Singapore)
Towards Calibrated Multi-label Deep Neural Networks
Jiacheng Cheng (University of California, San Diego) · Nuno Vasconcelos (University of California San Diego)
Collaborating Foundation models for Domain Generalized Semantic Segmentation
Yasser Benigmim (Telecom Paris) · Subhankar Roy (University of Aberdeen) · Slim Essid (Télécom Paris) · Vicky Kalogeiton (Ecole polytechnique, IP Paris) · Stéphane Lathuilière (Télécom ParisTech)
Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability
Yan Huang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Zhang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Qiang Wu (University of Technology Sydney) · yi zhong (Beijing Institute of Technology) · Liang Wang (CASIA)
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Xiang Wang (Huazhong University of Science and Technology) · Shiwei Zhang (Alibaba Group) · Hangjie Yuan (Zhejiang University) · Zhiwu Qing (Huazhong University of Science and Technology, Tsinghua University) · Biao Gong (Alibaba Group) · Yingya Zhang (Alibaba Group) · Yujun Shen (The Chinese University of Hong Kong) · Changxin Gao (Huazhong University of Science and Technology) · Nong Sang (Huazhong University of Science and Technology)
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Yujie Wei (Fudan University) · Shiwei Zhang (Alibaba Group) · Zhiwu Qing (Huazhong University of Science and Technology, Tsinghua University) · Hangjie Yuan (Zhejiang University) · Zhiheng Liu (University of Science and Technology of China) · Yu Liu (Alibaba Group) · Yingya Zhang (Alibaba Group) · Jingren Zhou (Alibaba Group) · Hongming Shan (None)
MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization
Jimin Xu (Zhejiang University) · Tianbao Wang (Zhejiang University) · Tao Jin (Zhejiang University) · Shengyu Zhang (Zhejiang University) · Dongjie Fu (Zhejiang University) · Zhe Wang (Alibaba) · Jiangjing Lyu (None) · Chengfei Lv (Zhejiang University) · Chaoyue Niu (Shanghai Jiaotong University) · Zhou Yu (Hangzhou Dianzi University) · Zhou Zhao (Zhejiang University, Tsinghua University) · Fei Wu (Zhejiang University)
3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation
Zidu Wang (Institute of automation, Chinese Academy of Sciences) · Xiangyu Zhu (None) · Tianshuo Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · baiqin wang (None) · Zhen Lei (Institute of Automation, Chinese Academy of Sciences)
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
Supreeth Narasimhaswamy (Stony Brook University, New York) · Uttaran Bhattacharya (Adobe Inc.) · Xiang Chen (Adobe Research) · Ishita Dasgupta (Department of Computer Science, University of Massachusetts at Amherst) · Saayan Mitra (Adobe Research) · Minh Hoai (State University of New York, Stony Brook)
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
Lu Ling (Purdue University) · Yichen Sheng (Purdue University) · Zhi Tu (Purdue University) · Wentian Zhao (Adobe Systems) · Cheng Xin (Rutgers University) · Kun Wan (Adobe Inc.) · Lantao Yu (Adobe Inc.) · Qianyu Guo (None) · Zixun Yu (Purdue University) · Yawen Lu (Purdue University) · Xuanmao Li (Huazhong University of Science and Technology) · Xingpeng Sun (Purdue University) · Rohan Ashok (Purdue University) · Aniruddha Mukherjee (Purdue University) · Hao Kang (Wormpex AI Research) · Xiangrui Kong (Purdue University) · Gang Hua (Wormpex AI Research) · Tianyi Zhang (Purdue University) · Bedrich Benes (Purdue University) · Aniket Bera (Purdue University)
OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
Jisoo Jeong (Qualcomm AI Research) · Hong Cai (Qualcomm AI Research) · Risheek Garrepalli (Qualcomm Inc, QualComm) · Jamie Lin (Qualcomm) · Munawar Hayat (Monash University) · Fatih Porikli (QualComm)
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
Pablo Marcos-Manchón (University of Barcelona) · Roberto Alcover-Couso (Universidad Autónoma de Madrid) · Juan SanMiguel (Universidad Autónoma de Madrid) · Jose M. Martinez (Universidad Autónoma de Madrid)
BoQ: A Place is Worth a Bag of Learnable Queries
Amar Ali-bey (Université Laval) · Brahim Chaib-draa (Laval university) · Philippe Giguère (Université Laval)
Generalizable Face Landmarking Guided by Conditional Face Warping
Jiayi Liang (Beijing Institute of Technology) · Haotian Liu (Beijing Institute of Technology) · Hongteng Xu (Renmin University of China) · Dixin Luo (Beijing Institute of Technology)
Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses
Inhee Lee (Seoul National University) · Byungjun Kim (Seoul National University) · Hanbyul Joo (Seoul National University)
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Ganggui Ding (Zhejiang University) · Canyu Zhao (Zhejiang University) · Wen Wang (Zhejiang University) · Zhen Yang (Zhejiang University) · Zide Liu (Zhejiang University) · Hao Chen (Zhejiang University) · Chunhua Shen (Zhejiang University)
Semantic Human Mesh Reconstruction with Textures
xiaoyu zhan (Nanjing University) · Jianxin Yang (nanjing university) · Yuanqi Li (Nanjing University) · Jie Guo (Nanjing University) · Yanwen Guo (Nanjing University) · Wenping Wang (Texas A&M University - College Station)
ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction
Zhicheng Zhang (Nankai University) · Junyao Hu (Nankai University) · Wentao Cheng (Nankai University) · Danda Paudel (INSAIT, Sofia University) · Jufeng Yang (None)
TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
Xiaopei Wu (Zhejiang University) · Yuenan Hou (Shanghai AI Laboratory) · Xiaoshui Huang (Shanghai AI Laboratory) · Binbin Lin (Zhejiang University) · Tong He (Shanghai AI Lab) · Xinge Zhu (The Chinese University of Hong Kong) · Yuexin Ma (ShanghaiTech University) · Boxi Wu (Zhejiang University) · Haifeng Liu (Zhejiang University) · Deng Cai (Zhejiang University) · Wanli Ouyang (University of Sydney)
Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing
Jan-Nico Zaech (INSAIT Sofia, ETH Zürich) · Martin Danelljan (ETH Zurich) · Tolga Birdal (Imperial College London) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.)
Robust Image Denoising through Adversarial Frequency Mixup
Donghun Ryou (Seoul National University) · Inju Ha (Seoul National University) · Hyewon Yoo (Seoul National University) · Dongwan Kim (Seoul National University) · Bohyung Han (Seoul National University)
Learning Occupancy for Monocular 3D Object Detection
Liang Peng (FABU Inc) · Junkai Xu (Zhejiang University) · Haoran Cheng (College of Computer Science and Technology, Zhejiang University) · Zheng Yang (Fabu Inc) · Xiaopei Wu (Zhejiang University) · Wei Qian (Fabu Inc.) · Wenxiao Wang (Zhejiang University) · Boxi Wu (Zhejiang University) · Deng Cai (Zhejiang University)
SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
Rongyuan Wu (Hong Kong Polytechnic University) · Tao Yang (Tsinghua University, Tsinghua University) · Lingchen Sun (Hong Kong Polytechnic University) · Zhengqiang ZHANG (The Hong Kong Polytechnic University, Hong Kong Polytechnic University) · Shuai Li (The Hong Kong Polytechnic University) · Lei Zhang (The Hong Kong Polytechnic University)
DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling
Miguel Fainstein (Universidad de Buenos Aires) · Viviana Siless (Universidad Torcuato di Tella) · Emmanuel Iarussi (Universidad Torcuato di Tella / Conicet)
Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers
Sheng Yang () · Jiawang Bai (None) · Kuofeng Gao (Tsinghua University, Tsinghua University) · Yong Yang (Tencent Security) · Yiming Li (Zhejiang University) · Shu-Tao Xia (Shenzhen International Graduate School, Tsinghua University)
DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model
Zhenghao Pan (Harbin Institute of Technology (Shenzhen)) · Haijin Zeng (IMEC & Universiteit Gent) · Jiezhang Cao (ETH Zürich) · Kai Zhang (None) · Yongyong Chen (Harbin Institute of Technology (Shenzhen))
Relightable Gaussian Codec Avatars
Shunsuke Saito (Reality Labs Research) · Gabriel Schwartz (Meta) · Tomas Simon (Meta) · Junxuan Li (Meta Reality Labs) · Giljoo Nam (Meta)
WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification
Satish Kumar (None) · Bowen Zhang (University of California, Santa Barbara) · Chandrakanth Gudavalli (University of California, Santa Barbara) · Connor Levenson (University of California, Santa Barbara) · Lacey Hughey (Smithsonian National Zoo and Conservation Biology Institute) · Jared Stabach (Smithsonian Conservation Biology Institute) · Irene Amoke (Kenya Wildlife Trust) · Gordon Ojwang (University of Groningen) · Joseph Mukeka (Wildlife Reserach and Training Institute) · Howard Frederick (Tanzania Wildlife Research Institute) · Stephen Mwiu (Wildlife Research and Training Institute) · Joseph Ochieng Ogutu (Universität Hohenheim) · B S Manjunath (UC Santa Barbara)
Pre-training Vision Models with Mandelbulb Variations
Benjamin N. Chiche (Rist Inc.) · Yuto Horikawa (Rist) · Ryo Fujita (Kyoto University)
Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models
Luo Jiayun (Nanyang Technological University) · Siddhesh Khandelwal (None) · Leonid Sigal (University Of British Columbia) · Boyang Li (Nanyang Technological University)
SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
Keqi Chen (University of Strasbourg) · vinkle srivastav (University of Strasbourg) · Nicolas Padoy (University of Strasbourg)
Context-Aware Integration of Language and Visual References for Natural Language Tracking
Yanyan Shao (None) · Shuting He (Nanyang Technological University) · Qi Ye (Zhejiang University) · Yuchao Feng (None) · Wenhan Luo (SUN YAT-SEN UNIVERSITY) · Jiming Chen (Zhejiang University)
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan (Shanghai Jiaotong University) · Lixin Yang (Shanghai Jiao Tong University) · Yifei Zhao (Shanghai Jiaotong University) · Kangrui Mao (Shanghai Jiao Tong University) · Hanlin Xu (Shanghai Jiaotong University) · Zenan Lin (South China University of Technology) · Kailin Li (Shanghai Jiaotong University) · Cewu Lu (Shanghai Jiao Tong University)
LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example
Soyeon Yoon (Korea Advanced Institute of Science & Technology) · Kwan Yun (Korea Advanced Institute of Science & Technology) · Kwanggyoon Seo (KAIST) · Sihun Cha (Korea Advanced Institute of Science and Technology) · Jung Eun Yoo (Korea Advanced Institute of Science & Technology) · Junyong Noh (Korea Advanced Institute of Science and Technology)
ReGenNet: Towards Human Action-Reaction Synthesis
Liang Xu (Shanghai Jiao Tong University) · Yizhou Zhou (WeChat AI) · Yichao Yan (Shanghai Jiao Tong University) · Xin Jin (Eastern Institute of Technology, Ningbo) · Wenhan Zhu (None) · Fengyun Rao (WeChat, Tencent Inc.) · Xiaokang Yang (Shanghai Jiao Tong University, China) · Wenjun Zeng (None)
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
Zi-Ting Chou (National Taiwan University) · Sheng-Yu Huang (National Taiwan University) · I-Jieh Liu (National Taiwan University) · Yu-Chiang Frank Wang (NVIDIA)
LTA-PCS: Learnable Task-Agnostic Point Cloud Sampling
Jiaheng Liu (Beihang University) · Jianhao Li (Beihang University) · Kaisiyuan Wang (University of Sydney) · Hongcheng Guo (Beihang University) · Jian Yang (Alibaba Group) · Junran Peng (Institute of automation, Chinese academy of science) · Ke Xu (Beijing University of Aeronautics and Astronautics) · Xianglong Liu (BUAA) · Jinyang Guo (Beijing University of Aeronautics and Astronautics)
DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization
Zeqin Yu (Sun Yat-Sen University) · Jiangqun Ni (Sun Yat-Sen University) · Yuzhen Lin (Shenzhen University) · Haoyi Deng (Shenzhen University) · Bin Li (Shenzhen University)
Initialization Matters for Adversarial Transfer Learning
Andong Hua (University of California, Santa Barbara) · Jindong Gu (University of Oxford & Google Research) · Zhiyu Xue (University of California, Santa Barbara) · Nicholas Carlini (None) · Eric Wong (University of Pennsylvania) · Yao Qin (University of California, Santa Barbara)
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Hang Li (University of Munich) · Chengzhi Shen (Technische Universität München) · Philip H.S. Torr (University of Oxford) · Volker Tresp (Ludwig-Maximilians-Universität München) · Jindong Gu (University of Oxford & Google Research)
Universal Segmentation at Arbitrary Granularity with Language Instruction
Yong Liu (None) · Cairong Zhang (ByteDance Inc.) · Yitong Wang (ByteDance Inc) · Jiahao Wang (Shanghai AI Lab) · Yujiu Yang (Tsinghua University) · Yansong Tang (Tsinghua University)
On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm
Peng Sun (Zhejiang University & Westlake University) · Bei Shi (Northwestern Polytechnical University, Northwest Polytechnical University Xi'an) · Daiwei Yu (Hangzhou City University) · Tao Lin (Westlake University)
Dr. Bokeh: DiffeRentiable Occlusion-aware Bokeh Rendering
Yichen Sheng (Purdue University) · Zixun Yu (Purdue University) · Lu Ling (Purdue University) · Zhiwen Cao (Adobe Systems) · Xuaner Zhang (Adobe) · Xin Lu (Adobe Inc.) · Ke Xian (Nanyang Technological University) · Haiting Lin (Adobe Systems) · Bedrich Benes (Purdue University)
BrainWash: A Poisoning Attack to Forget in Continual Learning
Ali Abbasi (Vanderbilt University) · Parsa Nooralinejad (University of California, Davis) · Hamed Pirsiavash (University of California, Davis) · Soheil Kolouri (Vanderbilt University)
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Mehmet Saygin Seyfioglu (University of Washington) · Wisdom Ikezogwo (University of Washington) · Fatemeh Ghezloo (University of Washington) · Ranjay Krishna (University of Washington) · Linda Shapiro (University of Washington)
Segment and Caption Anything
Xiaoke Huang (Shenzhen International Graduate School, Tsinghua University) · Jianfeng Wang (Microsoft) · Yansong Tang (Tsinghua University) · Zheng Zhang (Microsoft) · Han Hu (Microsft Research Asia) · Jiwen Lu (Tsinghua University) · Lijuan Wang (Microsoft) · Zicheng Liu (Microsoft)
Selective nonlinearities removal from digital signals
Krzysztof Maliszewski (University of Canterbury) · Magdalena Urbanska (Massey University) · Varvara Vetrova (University of Canterbury) · Sylwia Kolenderska (University of Canterbury)
CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation
Lingjun Zhao (University of Michigan - Ann Arbor) · Jingyu Song (University of Michigan - Ann Arbor) · Katherine Skinner (University of Michigan - Ann Arbor)
Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D
Mukund Varma T (University of California, San Diego) · Peihao Wang (University of Texas, Austin) · Zhiwen Fan (University of Texas, Austin) · Zhangyang Wang (University of Texas at Austin) · Hao Su (UCSD) · Ravi Ramamoorthi (None)
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
JunDa Cheng (Huazhong University of Science and Technology) · Wei Yin ( Shenzhen DJI Sciences and Technologies Ltd.) · Kaixuan Wang (Hong Kong University of Science and Technology) · Xiaozhi Chen (DJI Innovations) · Shijie Wang (Huazhong University of Science and Technology) · Xin Yang (Huazhong University of Science and Technology)
Self-correcting LLM-controlled Diffusion
Tsung-Han Wu (University of California, Berkeley) · Long Lian (University of California, Berkeley) · Joseph Gonzalez (University of California - Berkeley) · Boyi Li (UC Berkeley / NVIDIA) · Trevor Darrell (Electrical Engineering & Computer Science Department)
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
Fan Zhang (Beijing Institute of Technology) · Shaodi You (Kyushu University) · Yu Li (International Digital Economy Academy) · Ying Fu (None)
Atom-Level Optical Chemical Structure Recognition with Limited Supervision
Martijn Oldenhof (KU Leuven) · Edward De Brouwer (Yale University) · Adam Arany (KU Leuven) · Yves Moreau (University of Leuven)
Scalable 3D Registration via Truncated Entry-wise Absolute Residuals
Tianyu Huang (The Chinese University of Hong Kong) · Liangzu Peng (Johns Hopkins University) · Rene Vidal (Johns Hopkins University) · Yun-Hui Liu (The Chinese University of Hong Kong)
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao (Nanjing University) · Bingkun Huang (Nanjing University) · Sen Xing (Tsinghua University, Tsinghua University) · Gangshan Wu (Nanjing University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Limin Wang (Nanjing University)
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang (Korea Advanced Institute of Science & Technology) · Jihoon Kim (None) · Junseok Ahn (Korea Advanced Institute of Science and Technology) · Doyeop Kwak (Korea Advanced Institute of Science & Technology) · Hongsun Yang (42dot) · Yooncheol Ju (42dot) · ILHWAN KIM (None) · Byeong-Yeol Kim (42dot) · Joon Chung (KAIST)
Generative Image Dynamics
Zhengqi Li (Google) · Richard Tucker (Google) · Noah Snavely (Google / Cornell) · Aleksander Holynski (UC Berkeley & Google Research)
Continual Forgetting for Pre-trained Vision Models
Hongbo Zhao (Institute of Automation, Chinese Academy of Sciences) · Bolin Ni (Institute of Automation, Chinese Academy of Sciences) · Junsong Fan (Centre for Artificial Intelligence and Robotics (CAIR) Hong Kong Institute of Science & Innovation Chinese Academy of Sciences) · Yuxi Wang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Yuntao Chen (CAIR, HKISI, CAS) · Gaofeng Meng (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Fengda Zhang (Zhejiang University) · Qianpei He (Zhejiang University) · Kun Kuang (Zhejiang University) · Jiashuo Liu (Tsinghua University, Tsinghua University) · Long Chen (HKUST) · Chao Wu (Zhejiang University) · Jun Xiao (Zhejiang University) · Hanwang Zhang (Nanyang Technological University)
CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs
Yingji Zhong (None) · Lanqing Hong (Huawei Technologies Ltd.) · Zhenguo Li (Huawei) · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology)
Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation
Tianshui Chen (Guangdong University of Technology) · Jianman Lin (None) · Zhijing Yang (Guangdong University of Technology) · Chunmei Qing (South China University of Technology) · Liang Lin (Sun Yat-sen University)
A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint
Xiaofeng Cong (Southeast University) · Jie Gui (Southeast University) · Jing Zhang (The University of Sydney) · Junming Hou (Southeast University) · Hao Shen (Hefei University of Technology)
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao (National University of Singapore) · Zhan Tong (Ant Group) · Kevin Qinghong Lin (national university of singaore, National University of Singapore) · Joya Chen (National University of Singapore) · Mike Zheng Shou (National University of Singapore)
THRONE: A Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul (University of Oxford, University of Oxford) · Zhizhong Li (Amazon) · Hao Yang (Amazon) · Yonatan Dukler (AWS AI) · Ashwin Swaminathan (University of Maryland, College Park) · CJ Taylor (Amazon AWS) · Stefano Soatto (AWS)
Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Amirhossein Habibian (Qualcomm AI Research) · Amir Ghodrati (QualComm AI Research) · Noor Fathima (Qualcomm Inc, QualComm) · Guillaume Sautiere (Qualcomm Inc, QualComm) · Risheek Garrepalli (Qualcomm Inc, QualComm) · Fatih Porikli (QualComm) · Jens Petersen (Qualcomm AI Research)
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Huan Ling (Nvidia, University of Toronto) · Seung Wook Kim (NVIDIA) · Antonio Torralba (MIT) · Sanja Fidler (Department of Computer Science, University of Toronto) · Karsten Kreis (NVIDIA)
Inlier Confidence Calibration for Point Cloud Registration
Yongzhe Yuan (Xidian University) · Yue Wu (Xidian University) · Xiaolong Fan (Xidian University) · Maoguo Gong (Xidian University) · Qiguang Miao (Xidian University) · Wenping Ma (Xidian University)
Memory-Scalable and Simplified Functional Map Learning
Robin Magnet (École Polytechnique) · Maks Ovsjanikov (Ecole Polytechnique, France)
ADFactory: An Effective Framework for Generalizing Optical Flow with NeRF
Han Ling (Nanjing University of Science and Technology) · Quansen Sun (Nanjing University of Science and Technology) · Yinghui Sun (Nanjing University of Science and Technology) · Xian Xu (Southeast Community College Area) · Xingfeng Li (Nanjing University of Science and Technology)
IReNe: Instant Recoloring of Neural Radiance Fields
Alessio Mazzucchelli (Arquimea Research Center) · Adrian Garcia-Garcia (Arquimea Research Center) · Elena Garces (Universidad Rey Juan Carlos) · Fernando Rivas-Manzaneque (None) · Francesc Moreno-Noguer (Universidad Politécnica de Cataluna) · Adrian Penate-Sanchez (Universidad de Las Palmas de Gran Canaria)
HardMo: A Large-Scale Hardcase Dataset for Motion Capture
Jiaqi Liao (None) · Chuanchen Luo (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Yinuo Du (Beijing University of Posts and Telecommunications) · Yuxi Wang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xu-Cheng Yin (University of Science and Technology Beijing) · Man Zhang (None) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Junran Peng (Institute of automation, Chinese academy of science)
HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions
Hao Xu (None) · Li Haipeng (None) · Yinqiao Wang (The Chinese University of Hong Kong) · Shuaicheng Liu (None) · Chi-Wing Fu (The Chinese University of Hong Kong)
An Empirical Study of the Generalization Ability of Lidar 3D Object Detectors to Unseen Domains
George Eskandar (Universität Stuttgart)
Constrained Layout Generation with Factor Graphs
Mohammed Haroon Dupty (National University of Singapore) · Yanfei Dong (PayPal Inc.) · Sicong Leng (Nanyang Technological University) · Guoji Fu (National University of Singapore) · Yong Liang Goh (National University of Singapore) · Wei Lu (Singapore University of Technology and Design) · Wee Sun Lee (National University of Singapore)
FastMAC: Stochastic Spectral Sampling of Correspondence Graph
Yifei Zhang (University of Chinese Academy of Sciences) · Hao Zhao (Tsinghua University, Tsinghua University) · Hongyang Li (Shanghai AI Lab) · Siheng Chen (Shanghai Jiao Tong University)
Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID
Wentao Tan (South China University of Technology) · Changxing Ding (South China University of Technology) · Jiayu Jiang (South China University of Technology) · Fei Wang (South China University of Technology) · Yibing Zhan (JD Explore Academy) · Dapeng Tao (Yunnan University)
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
guo (None) · Tianwei Lin (Horizon Robotics)
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu (Google) · Xia Hu (Research, Google) · Yaqing Wang (Research, Google) · Bo Pang (Google) · Radu Soricut (Google)
Distilling CLIP with Dual Guidance for Learning Discriminative Human Body Shape Representation
Feng Liu (Michigan State University) · Minchul Kim (Michigan State University) · Zhiyuan Ren (Michigan State University) · Xiaoming Liu (None)
Observation-Guided Diffusion Probabilistic Models
Junoh Kang (Seoul National University) · Jinyoung Choi (Seoul National University) · Sungik Choi (LG AI Research) · Bohyung Han (Seoul National University)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan liu (Xiamen University) · Yiwei Ma (Xiamen University) · Xiaoqing Zhang (Xiamen University) · Haowei Wang (Xiamen University) · Jiayi Ji (Xiamen University) · Xiaoshuai Sun (Xiamen University) · Rongrong Ji (Xiamen University)
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
Chengyao Wang (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Li Jiang (Max Planck Institute for Informatics) · Xiaoyang Wu (The University of Hong Kong) · Zhuotao Tian (The Chinese University of Hong Kong) · Bohao Peng (The Chinese University of Hong Kong) · Hengshuang Zhao (The University of Hong Kong) · Jiaya Jia (The Chinese University of Hong Kong)
Fully Exploiting Every Real Sample: Super-Pixel Sample Gradient Model Stealing
Yunlong Zhao () · Xiaoheng Deng (Central South University) · Yijing Liu (Zhejiang University) · Xinjun Pei (None) · Jiazhi Xia (Central South University) · Wei Chen (State key laboratory of CAD&CG)
LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising
Yuxing Duan (None)
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
Chenlu Zhan (None) · Gaoang Wang (Zhejiang University) · Yu LIN (Zhejiang University) · Hongwei Wang (Zhejiang University) · Jian Wu (Zhejiang University)
DePT: Decoupled Prompt Tuning
Ji Zhang (University of Electronic Science and Technology of China) · Shihan Wu (University of Electronic Science and Technology of China) · Lianli Gao (University of Electronic Science and Technology of China, Tsinghua University) · Heng Tao Shen (University of Electronic Science and Technology of China) · Jingkuan Song (University of Electronic Science and Technology of China,)
A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion
Feng Yu (University of Minnesota - Twin Cities) · Teng Zhang (University of Central Florida) · Gilad Lerman (University of Minnesota, Minneapolis)
Bi-level Learning of Task-Specific Decoders for Joint Registration and One-Shot Medical Image Segmentation
Xin Fan (Dalian University of Technology) · Xiaolin Wang (Dalian University of Technology) · Jiaxin Gao (Dalian University of Technology) · Jia Wang (Dalian University of Technology) · Zhongxuan Luo (Dalian University of Technology) · Risheng Liu (Dalian University of Technology)
Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan (Zhejiang University) · Wentong Li (College of Computer Science and Technology, Zhejiang University) · Jian liu (AntGroup) · Dongqi Tang (Ant Group) · Xinjie Luo (Zhejiang University) · Chi Qin (Microsoft) · Lei Zhang (The Hong Kong Polytechnic University) · Jianke Zhu (Zhejiang University)
NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation
Sicheng Li (Zhejiang University) · Hao Li (None) · Yiyi Liao (Zhejiang University) · Lu Yu (Zhejiang University)
Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance
Yuto Enyo (Kyoto University) · Ko Nishino (Kyoto University)
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
Hyelin Nam (Korea Advanced Institute of Science & Technology) · Gihyun Kwon (Korea Advanced Institute of Science & Technology) · Geon Yeong Park (Korea Advanced Institute of Science and Technology) · Jong Chul Ye (Korea Advanced Institute of Science and Technology)
Domain Prompt Learning with Quaternion Networks
Qinglong Cao (Shanghai Jiao Tong University) · Zhengqin Xu (Shanghai Jiaotong University) · Yuntian Chen (Eastern Institute for Advanced Study) · Chao Ma (Shanghai Jiao Tong University) · Xiaokang Yang (Shanghai Jiao Tong University, China)
Towards More Unified In-context Visual Understanding
Dianmo Sheng (University of Science and Technology of China) · Dongdong Chen (Microsoft Research) · Zhentao Tan (Alibaba DAMO Academy; University of Science and Technology of China) · Qiankun Liu (Beijing Institute of Technology) · Qi Chu (University of Science and Technology of China) · Jianmin Bao (Microsoft) · Tao Gong (University of Science and Technology of China) · Bin Liu (None) · Shengwei Xu (Beijing Electronic Science and Technology Institute) · Nenghai Yu (University of Science and Technology of China)
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Lewei Yao (Harbin Institute of Technology) · Renjie Pi (None) · Jianhua Han (Huawei Technologies Ltd.) · Xiaodan Liang (Sun Yat-sen University) · Hang Xu (Huawei Noah‘s Ark Lab) · Wei Zhang (Huawei Technologies Ltd.) · Zhenguo Li (Huawei) · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology)
Uncertainty-Guided Never-Ending Learning to Drive
Lei Lai (Boston University, Boston University) · Eshed Ohn-Bar (Boston University, Boston University) · Sanjay Arora (Red Hat, Inc.) · John Yi (Boston University, Boston University)
PlatoNeRF: 3D Reconstruction in Plato’s Cave via Single-View Two-Bounce Lidar
Tzofi Klinghoffer (Massachusetts Institute of Technology) · Xiaoyu Xiang (Meta) · Siddharth Somasundaram (Massachusetts Institute of Technology) · Yuchen Fan (Facebook) · Christian Richardt (Meta Reality Labs) · Ramesh Raskar (Massachusetts Institute of Technology) · Rakesh Ranjan ()
Koala: Key frame-conditioned long video-LLM
Reuben Tan (Boston University) · Ximeng Sun (Boston University) · Ping Hu (University of Electronic Science and Technology of China) · Jui-Hsien Wang (Adobe Systems) · Hanieh Deilamsalehy (None) · Bryan A. Plummer (None) · Bryan Russell (Adobe Research) · Kate Saenko (Meta / Boston University)
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors
Biwen Lei (Alibaba Group) · Kai Yu (None) · Mengyang Feng (Alibaba Group) · Miaomiao Cui (Alibaba Group) · Xuansong Xie (Alibaba Group)
ZeroShape: Regression-based Zero-shot Shape Reconstruction
Zixuan Huang (University of Illinois Urbana-Champaign) · Stefan Stojanov (Georgia Institute of Technology) · Anh Thai (Georgia Institute of Technology) · Varun Jampani (Google Research) · James Rehg (None)
Your Transferability Barrier is Fragile: Free-Lunch for Transferring the Non-Transferable Learning
Ziming Hong (The University of Sydney) · Li Shen (JD Explore Academy) · Tongliang Liu (Mohamed bin Zayed University of Artificial Intelligence)
ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
Yifan Bai (Xi’an Jiaotong University) · Zeyang Zhao (Xi'an Jiaotong University) · Yihong Gong (Xi'an Jiaotong University) · Xing Wei (None)
DPHMs: Diffusion Parametric Head Models for Depth-based Tracking
Jiapeng Tang (Technische Universität München) · Angela Dai () · Yinyu Nie (Huawei Technologies Ltd.) · Lev Markhasin (None) · Justus Thies (Max-Planck Institute for Intelligent Systems) · Matthias Nießner (Technical University of Munich)
CNC-Net: Self-Supervised Learning for CNC Machining Operations
Mohsen Yavartanoo (None) · Sangmin Hong (Seoul National University) · Reyhaneh Neshatavar (None) · Kyoung Mu Lee (Seoul National University)
High-Quality Facial Geometry and Appearance Capture at Home
Yuxuan Han (Tsinghua University) · Junfeng Lyu (School of Software, Tsinghua University) · Feng Xu (Tsinghua University, Tsinghua University)
Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data
Yu Deng (Xiaobing.ai) · Duomin Wang () · Xiaohang Ren (xiaobing) · Xingyu Chen (Xiaobing.AI) · Baoyuan Wang (Xiaobing.ai)
Efficient Scene Recovery Using Luminous Flux Prior
ZhongYu Li (University of Science and Technology of China) · Lei Zhang (University of Science and Technology of China)
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
Hoang-Quan Nguyen (University of Arkansas - Fayetteville) · Thanh-Dat Truong (University of Arkansas) · Xuan-Bac Nguyen (None) · Ashley Dowling (University of Arkansas - Fayetteville) · Xin Li (State University of New York at Albany) · Khoa Luu (University of Arkansas)
IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Yizhi Song (Purdue University) · Zhifei Zhang (Adobe Research) · Zhe Lin (Adobe Research) · Scott Cohen (Adobe Systems) · Brian Price (Adobe Research) · Jianming Zhang (Adobe Systems) · Soo Ye Kim (Adobe Systems) · He Zhang (Adobe Systems) · Wei Xiong (Adobe Systems) · Daniel Aliaga (Purdue University)
Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels
Zhuohong Li () · Wei He (Wuhan University) · Jiepan Li (None) · Fangxiao Lu (Wuhan University) · Hongyan Zhang (China University of Geosciences Wuhan)
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
Siteng Huang (Zhejiang University & Westlake University) · Biao Gong (Alibaba Group) · Yutong Feng (Alibaba Group) · Zhang Min (Westlake University) · Yiliang Lv (Gientech AIL) · Donglin Wang (Westlake University)
Hyperbolic Anomaly Detection
Huimin Li (Beihang University) · Zhentao Chen (Beihang University) · Yunhao Xu (Beihang University) · Junlin Hu (Beihang University)
Multiple View Geometry Transformers for 3D Human Pose Estimation
Ziwei Liao (University of Toronto) · jialiang zhu (Southeast University) · Chunyu Wang (Microsoft) · Han Hu (Microsft Research Asia) · Steven L. Waslander (University of Toronto)
Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data
Xinting Liao (Zhejiang Univerisity) · Weiming Liu (Zhejiang University) · Chaochao Chen (Zhejiang University) · Pengyang Zhou (Zhejiang University) · Fengyuan Yu (Zhejiang University) · Huabin Zhu (Zhejiang University) · Binhui Yao (University of Canberra) · Tao Wang (Midea Group) · Xiaolin Zheng (Zhejiang University) · Yanchao Tan (Fuzhou University)
SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes
Soubhik Sanyal (None) · Partha Ghosh (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Jinlong Yang (Google) · Michael J. Black (University of Tübingen) · Justus Thies (Max-Planck Institute for Intelligent Systems) · Timo Bolkart (Google)
Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment
Ziyu Shan (Shanghai Jiao Tong University) · Yujie Zhang (Shanghai Jiao Tong University) · Qi Yang (Tencent MediaLab) · Haichen Yang (Shanghai Jiaotong University) · Yiling Xu (None) · Jenq-Neng Hwang (None) · Xiaozhong Xu (Tencent Media Lab) · Shan Liu (Tencent Media Lab)
Training-free Pretrained Model Merging
Zhengqi Xu (Zhejiang University) · Ke Yuan (Zhejiang University) · Huiqiong Wang (Zhejiang University) · Yong Wang (State Grid Shandong Electronic Power Company) · Mingli Song (Zhejiang University) · Jie Song (Zhejiang University)
Anatomically Constrained Implicit Face Models
Prashanth Chandran (None) · Gaspard Zoss (Disney Research, Disney)
Revisiting Global Translation Estimation with Feature Tracks
Peilin Tao (None) · Hainan Cui (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Mengqi Rong (, Institute of automation, Chinese academy of science) · Shuhan Shen (Institute of automation, Chinese academy of science)
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang (Indiana University, Bloomington) · Feng Cheng (University of North Carolina at Chapel Hill) · Gedas Bertasius (UNC Chapel Hill)
WinSyn: A High Resolution Testbed for Synthetic Data
Tom Kelly (King Abdullah University of Science and Technology) · John Femiani (None) · Peter Wonka (KAUST)
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou (Huazhong University of Science and Technology) · Yicong Liu (Huazhong University of Science and Technology) · Yiman Hu (Huazhong University of Science and Technology) · Yuhua Li (Huazhong University of Science and Technology) · Ruixuan Li (Huazhong University of Science and Technology)
Neural Super-Resolution for Real-time Rendering with Radiance Demodulation
Jia Li (Shandong University) · Ziling Chen (Shandong University) · Xiaolong Wu (None) · Lu Wang (Shandong University) · Beibei Wang (Nankai University) · Lei Zhang (The Hong Kong Polytechnic University)
Noisy One-point Homographies are Surprisingly Good
Yaqing Ding (None) · Jonathan Astermark (Lund University) · Magnus Oskarsson (Lund University) · Viktor Larsson (Lund University)
Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma (Massachusetts Institute of Technology) · Varun Jampani (Google Research) · Yuanzhen Li (Massachusetts Institute of Technology) · Xuhui Jia (Google) · Dmitry Lagun (Google) · Fredo Durand (Massachusetts Institute of Technology) · William Freeman (MIT and Google) · Mark Matthews (Google)
DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang (Nanyang Technological University) · Linjie Li (Microsoft) · Kevin Lin (Microsoft) · Yuanhao Zhai (State University of New York at Buffalo) · Chung-Ching Lin (Microsoft) · Zhengyuan Yang (Microsoft) · Hanwang Zhang (Nanyang Technological University) · Zicheng Liu (Microsoft) · Lijuan Wang (Microsoft)
PaReNeRF: Toward Fast Large-scale Dynamic NeRF with Patch-based Reference
Xiao Tang (None) · Min Yang (None) · Penghui Sun (Samsung R&D Institute) · Hui Li (Samsung R&D Institute China Xi’an (SRCX)) · Yuchao Dai (Northwestern Polytechnical University) · feng zhu (None) · Hojae Lee (None)
FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning
Junyuan Zhang (University of Hong Kong) · Shuang Zeng (The University of Hong Kong) · Miao Zhang (New York University) · Runxi Wang (Beijing University of Aeronautics and Astronautics) · Feifei Wang (Stanford University) · Yuyin Zhou (UC Santa Cruz) · Paul Pu Liang (Carnegie Mellon University) · Liangqiong Qu (The University of Hong Kong)
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang (Rochester Institute of Technology) · Guohao Sun (Rochester Institute of Technology) · Pichao Wang (Amazon) · Dongfang Liu (Rochester Institute of Technology) · Sohail Dianat (Rochester Institute of Technology) · MAJID RABBANI (Rochester Institute of Technology) · Raghuveer Rao (DEVCOM Army Research Laboratory) · ZHIQIANG TAO (Rochester Institute of Technology)
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
Simindokht Jahangard (None) · Zhixi Cai (None) · Shiki Wen (Monash University) · Hamid Rezatofighi (Monash University)
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin (Peking University) · Ryuichi Takanobu (miHoYo) · Cai Zhang (Nanrui Group Co., Ltd) · Xiaochun Cao (SUN YAT-SEN UNIVERSITY) · Li Yuan (Peking University)
Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
Xun Lin (Beihang University) · Shuai Wang (Beihang University) · RIZHAO CAI (Nanyang Technological University) · Yizhong Liu (Beihang University) · Ying Fu (None) · Wenzhong Tang (Beihang University) · Zitong YU (Nanyang Technological University) · Alex C. Kot (Nanyang Technological University)
Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation
Qinghe Ma (Nanjing University) · Jian Zhang (Nanjing university) · Lei Qi (Southeast University) · Qian Yu (Shandong Women's University) · Yinghuan Shi (Nanjing University) · Yang Gao (Nanjing University)
Universal Novelty Detection through Adaptive Contrastive Learning
Hossein Mirzaei (Sharif University of Technology, Sharif University of Technology) · Mojtaba Nafez (Sharif University of Technology) · Mohammad Jafari (Sharif University of Technology) · Mohammad Soltani (Sharif University of Technology) · Mohammad Azizmalayeri (Amsterdam UMC) · Jafar Habibi (Sharif University of Technology) · Mohammad Sabokrou (Okinawa Institute of Science and Technology (OIST)) · Mohammad Rohban (Sharif University of Technology)
LAMP: Learn A Motion Pattern for Few-Shot Video Generation
Rui-Qi Wu (Nankai University) · Liangyu Chen (Megvii Technology Inc.) · Tong Yang (Fudan University) · Chun-Le Guo (None) · Chongyi Li () · Xiangyu Zhang (MEGVII Technology)
CLiC: Concept Learning in Context
Mehdi Safaee (SFU GrUVi Lab) · Aryan Mikaeili (Simon Fraser University) · Or Patashnik (Tel Aviv University) · Daniel Cohen-Or (Google) · Ali Mahdavi Amiri (Simon Fraser University)
Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis
Jiawen Li (Tsinghua University) · Yuxuan Chen (Tsinghua University) · Hongbo Chu (None) · Sun Qiehe (Tsinghua University) · Tian Guan (Graduate School at Shenzhen, Tsinghua University) · Anjia Han (SUN YAT-SEN UNIVERSITY) · Yonghong He (Tsinghua University, Tsinghua University)
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
Hao Fei (National University of Singapore) · Shengqiong Wu (National University of Singapore) · Wei Ji (None) · Hanwang Zhang (Nanyang Technological University) · Tat-seng Chua (National University of Singapore)
LEAD: Exploring Logit Space Evolution for Model Selection
Zixuan Hu (None) · Xiaotong Li (Peking University) · SHIXIANG TANG (The Chinese University of Hong Kong) · Jun Liu (Singapore University of Technology and Design (SUTD)) · Yichun Hu (Peking University) · Ling-Yu Duan (Peking University)
Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency
Yuqi Zhang (Sichuan University) · Han Luo (Sichuan University) · Yinjie Lei (Sichuan University)
MR-VNet: Media Restoration using Volterra Networks
Siddharth Roheda (Samsung Research) · Amit Unde (SRIB Bangalore) · Loay Rashid (Samsung Research Institute Bangalore)
WonderJourney: Going from Anywhere to Everywhere
Hong-Xing Yu (Computer Science Department, Stanford University) · Haoyi Duan (Stanford University) · Junhwa Hur (Google) · Kyle Sargent (Computer Science Department, Stanford University) · Michael Rubinstein (Google) · William Freeman (MIT and Google) · Forrester Cole (Google) · Deqing Sun (Google) · Noah Snavely (Google / Cornell) · Jiajun Wu (Stanford University) · Charles Herrmann (Google)
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets
Youngju Na (KAIST) · Woo Jae Kim (Korea Advanced Institute of Science and Technology (KAIST)) · Kyu Han (Korea Advanced Institute of Science & Technology) · Suhyeon Ha (Korea Advanced Institute of Science and Technology) · Sung-Eui Yoon (KAIST)
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
Yuxuan Zhang (Shanghai Jiao Tong University) · Yiren Song (Shanghai Jiaotong University) · Jiaming Liu (Xiaohongshu) · Rui Wang (Beijing University of Posts and Telecommunications) · Jinpeng Yu (None) · Hao Tang (ETH Zurich and CMU) · Huaxia Li (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Xu Tang (Shanghaitech University) · Yao Hu (Zhejiang University, Tsinghua University) · Han Pan (Shanghai Jiao Tong University) · Zhongliang Jing (Shanghai Jiao Tong University)
Few-shot Learner Parameterization by Diffusion Time-steps
Zhongqi Yue (Nanyang Technological University) · Pan Zhou (Sea Group) · Richang Hong (Hefei University of Technology) · Hanwang Zhang (Nanyang Technological University) · Qianru Sun (None)
Global and Hierarchical Geometry Consistency Priors for Few-shot NeRFs in Indoor Scenes
Xiaotian Sun (Xiamen University) · Qingshan Xu (Nanyang Technological University) · Xinjie Yang (Xiamen University) · Yu Zang (Xiamen University) · Cheng Wang (Xiamen University)
Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
Simon Niedermayr (Technical University of Munich) · Josef Stumpfegger (Technische Universität München) · rüdiger westermann (Technische Universität München)
The STVchrono Dataset: Towards Continuous Change Recognition in Time
Yanjun Sun (Keio University) · Yue Qiu (AIST, National Institute of Advanced Industrial Science and Technology) · Mariia Khan (Edith Cowan University) · Fumiya Matsuzawa (AIST, University of Tsukuba) · Kenji Iwata (AIST, National Institute of Advanced Industrial Science and Technology)
SPIN: Simultaneous Perception, Interaction and Navigation
Shagun Uppal (Carnegie Mellon University) · Ananye Agarwal (Carnegie Mellon University) · Haoyu Xiong (CMU, Carnegie Mellon University) · Kenneth Shaw (Carnegie Mellon University) · Deepak Pathak (Carnegie Mellon University)
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Bin Xie (Tianjin University) · Jiale Cao (Tianjin University) · Jin Xie (Chongqing University) · Fahad Shahbaz Khan (MBZUAI; Linköping University) · Yanwei Pang (Tianjin University)
Unleashing Channel Potential: Space-Frequency Selection Convolution for SAR Object Detection
Ke Li (Xidian University) · Di Wang (Xidian University) · Zhangyuan Hu (Xidian University) · Wenxuan Zhu (Xidian University) · Shaofeng Li (None) · Quan Wang (Xidian University)
Motion Blur Decomposition with Cross-shutter Guidance
Xiang Ji (The University of Tokyo) · Haiyang Jiang (None) · Yinqiang Zheng (None)
Real-time Acquisition and Reconstruction of Dynamic Volumes with Neural Structured Illumination
Yixin Zeng (Zhejiang University) · Zoubin Bi (State Key Laboratory of CAD&CG, Zhejiang Univerisity) · Yin Mingrui (Zhejiang University) · Xiang Feng (Zhejiang University) · Kun Zhou (Zhejiang University) · Hongzhi Wu (Zhejiang University)
MV-Adapter: Exploring Parameter Efficient Learning for Video Text Retrieval
bowen zhang (Bytedance) · Xiaojie Jin (ByteDance Inc./TikTok) · Weibo Gong (ByteDance) · Kai Xu (University of Chinese Academy of Sciences) · Xueqing Deng (ByteDance Research) · Peng Wang (Bytedance US AILab) · Zhao Zhang (Hefei University of Technology) · Xiaohui Shen (ByteDance) · Jiashi Feng (ByteDance)
Mind marginal non-crack regions: Clustering-inspired representation learning for crack segmentation
zhuangzhuang chen (shenzhen university) · Zhuonan Lai (Shenzhen University) · Jie Chen (Shenzhen University) · Jianqiang Li (Shenzhen University)
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao (Zhejiang University) · Qianqian Wang (Cornell University) · Shangzhan Zhang (Zhejiang University) · Nan Xue (Ant Group) · Sida Peng (None) · Yujun Shen (The Chinese University of Hong Kong) · Xiaowei Zhou (None)
FreePoint: Unsupervised Point Cloud Instance Segmentation
Zhikai Zhang (Wuhan University) · Jian Ding (None) · Li Jiang (Max Planck Institute for Informatics) · Dengxin Dai () · Gui-Song Xia (Wuhan University)
Perceptual Assessment and Optimization of HDR Image Rendering
Peibei Cao (City University of Hong Kong) · Rafal Mantiuk (University of Cambridge) · Kede Ma (City University of Hong Kong)
Programmable Motion Generation for Open-set Motion Control Tasks
Hanchao Liu (Tsinghua University) · Xiaohang Zhan (Tencent) · Shaoli Huang (Tencent AI Lab) · Tai-Jiang Mu (Tsinghua University, Tsinghua University) · Ying Shan (Tencent)
Projecting Trackable Thermal Patterns for Dynamic Computer Vision
Mark Sheinin (Weizmann Institute of Science) · Aswin C. Sankaranarayanan (Carnegie Mellon University) · Srinivasa G. Narasimhan (Carnegie Mellon University)
Overcoming Generic Knowledge Loss with Selective Parameter Update
Wenxuan Zhang (King Abdullah University of Science and Technology) · Paul Janson (Concordia University/ MILA) · Rahaf Aljundi (Toyota Motor Europe) · Mohamed Elhoseiny (KAUST)
EventPS: Real-Time Photometric Stereo Using an Event Camera
Bohan Yu (None) · Jieji Ren (Shanghai Jiao Tong University) · Jin Han () · Feishi Wang (Peking University) · Jinxiu Liang (Peking University) · Boxin Shi (Peking University)
Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction
Jinzhi Zheng (University of Chinese Academy of Sciences) · Heng Fan (University of North Texas) · Libo Zhang (Institute of Software Chinese Academy of Sciences)
Open-Vocabulary 3D Semantic Segmentation with Foundation Models
Li Jiang (Max Planck Institute for Informatics) · Shaoshuai Shi (Saarland Informatics Campus, Max-Planck Institute) · Bernt Schiele (Max Planck Institute for Informatics)
Pick-or-Mix: Dynamic Channel Sampling for ConvNets
Ashish Kumar (Indian Institute of Technology, Kanpur) · Daneul Kim (Seoul National University) · Jaesik Park (Seoul National University) · Laxmidhar Behera (Indian Institute of Technology , Kanpur)
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han (Sichuan University) · Fangrui Zhu (Northeastern University) · Qianru Lao (Harvard University) · Huaizu Jiang (Northeastern University)
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Zi-Xin Zou (None) · Zhipeng Yu (University of the Chinese Academy of Sciences) · Yuan-Chen Guo (Tsinghua University) · Yangguang Li (Shanghai AI Laboratory) · Yan-Pei Cao (Tencent ARC Lab) · Ding Liang (Tsinghua University, Tsinghua University) · Song-Hai Zhang (Tsinghua University, Tsinghua University)
CAMEL: CAusal Motion Enhancement tailored for Lifting Text-driven Video Editing
Guiwei Zhang (Beijing University of Aeronautics and Astronautics) · Tianyu Zhang (Du Xiaoman Financial) · Guanglin Niu (Beihang University) · Zichang Tan (Baidu) · Yalong Bai (JD AI Research) · Qing Yang (Du Xiaoman Technology(BeiJing))
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
Mubashir Noman (MBZUAI) · Muzammal Naseer (MBZUAI) · Hisham Cholakkal (MBZUAI) · Rao Anwer (Mohamed bin Zayed University of Artificial Intelligence) · Salman Khan (Mohamed bin Zayed University of Artificial Intelligence) · Fahad Shahbaz Khan (MBZUAI; Linköping University)
Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
Yong Shu (Shanghai University) · Liquan Shen (Shanghai University) · Xiangyu Hu (Shanghai University) · Mengyao Li (Shanghai University) · Zihao Zhou (Shanghai University)
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen (Johns Hopkins University) · Qihang Yu (Johns Hopkins University) · Xiaohui Shen (ByteDance) · Alan L. Yuille (Johns Hopkins University) · Liang-Chieh Chen (None)
GraCo: Granularity-Controllable Interactive Segmentation
Yian Zhao (Peking University) · Kehan Li (Peking University) · Zesen Cheng (Peking University) · Pengchong Qiao (Peking University) · Xiawu Zheng (Xiamen University) · Rongrong Ji (Xiamen University) · Chang Liu (Tsinghua University, Tsinghua University) · Li Yuan (Peking University) · Jie Chen (Peking University)
Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera
Jiye Lee (Seoul National University) · Hanbyul Joo (Seoul National University)
DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation
Yuanchen Wu (Shanghai University) · Xichen Ye (Shanghai University) · KequanYang (Shanghai University) · Jide Li () · Xiaoqiang Li (shanghai university)
Image Neural Field Diffusion Models
Yinbo Chen (University of California, San Diego) · Oliver Wang (Adobe Research) · Richard Zhang (Adobe Systems) · Eli Shechtman (Adobe) · Xiaolong Wang (UCSD) · Michaël Gharbi (Massachusetts Institute of Technology)
Segment Every Out-of-Distribution Object
Wenjie Zhao (Univeristy of Texas at Dallas) · Jia Li (None) · Xin Dong (Harvard University) · Yu Xiang (University of Texas, Dallas) · Yunhui Guo (The University of Texas at Dallas)
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Shangchen Zhou (Nanyang Technological University) · Peiqing Yang (S-Lab, Nanyang Technological University) · Jianyi Wang (Nanyang Technological University) · Yihang Luo (Nanyang Technological University) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY)
A Physics-informed Low-rank Deep Neural Network for Blind and Universal Lens Aberration Correction
Jin Gong (Tsinghua University) · Runzhao Yang (Department of Automation, Tsinghua University) · Weihang Zhang (Tsinghua University) · Jinli Suo (Tsinghua University, Tsinghua University) · Qionghai Dai (Tsinghua University, Tsinghua University)
DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Qihao Liu (Johns Hopkins University) · Yi Zhang (Sony Corporation of America) · Song Bai (ByteDance) · Adam Kortylewski (University of Freiburg & MPI-INF) · Alan L. Yuille (Johns Hopkins University)
An Interactive Navigation Method with Effect-oriented Affordance
XIAOHAN Wang (Xi'an Jiaotong University) · Yuehu LIU (College of Artificial Intelligence, Xi'an Jiaotong University) · Xinhang Song (None) · Yuyi Liu (Institute of Computing Technology,University of the Chinese Academy of Sciences) · Sixian Zhang (None) · Shuqiang Jiang (Institute of Computing Technology, Chinese Academy of Sciences)
NAPGuard: Towards Detecting Naturalistic Adversarial Patches
Siyang Wu (None) · Jiakai Wang (Zhongguancun Laboratory) · Jiejie Zhao (Zhongguancun Laboratory) · Yazhe Wang (Zhongguancun Laboratory) · Xianglong Liu (BUAA)
A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning
Xiaoyang Xu (Wuhan University) · Mengda Yang (None) · Wenzhe Yi (Wuhan University) · Ziang Li (None) · Juan Wang (None) · Hongxin Hu (State University of New York, Buffalo) · Yong ZHUANG (Wuhan University) · Yaxin Liu (Wuhan University)
Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball
Simon Weber (Technische Universität München) · Barış Zöngür (Technische Universität München) · Nikita Araslanov (TU Munich) · Daniel Cremers (Technical University Munich)
Generative Region-Language Pretraining for Open-Ended Object Detection
Chuang Lin (None) · Yi Jiang (bytedance) · Lizhen Qu (Monash University) · Zehuan Yuan (Nanjing University) · Jianfei Cai (Monash University)
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
Ruijie Quan (Zhejiang University) · Wenguan Wang (Zhejiang University) · Zhibo Tian (Lanzhou University) · Fan Ma (None) · Yi Yang (Zhejiang University)
Rethinking Multi-domain Generalization with A General Learning Objective
Zhaorui Tan (None) · Xi Yang (Xi'an Jiaotong-Liverpool University) · Kaizhu Huang (Duke Kunshan University)
A Theory of Joint Light and Heat Transport for Lambertian Scenes
Mani Ramanagopal (Carnegie Mellon University) · Sriram Narayanan (Carnegie Mellon University) · Aswin C. Sankaranarayanan (Carnegie Mellon University) · Srinivasa G. Narasimhan (Carnegie Mellon University)
Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset
Yujin Jeon (Pohang University of Science and Technology) · Eunsue Choi (Pohang University of Science and Technology) · Youngchan Kim (Pohang University of Science and Technology) · Yunseong Moon (Pohang University of Science and Technology) · Khalid Omer (Meta Reality Labs) · Felix Heide (Department of Computer Science, Princeton University) · Seung-Hwan Baek (POSTECH)
Efficient Stitchable Task Adaptation
Haoyu He (Monash University) · Zizheng Pan (None) · Jing Liu () · Jianfei Cai (Monash University) · Bohan Zhuang (Monash University)
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng (None) · Yan Xie (None) · Hao Zhang (Xidian University, Xi'an, China) · Chiyu Chen (Xi'an University of Electronic Science and Technology) · Zhengjue Wang (Xidian University) · Bo Chen (Xidian University)
MuGE: Multiple Granularity Edge Detection
Caixia Zhou (None) · Yaping Huang (Beijing Jiaotong University) · Mengyang Pu (North China Electric Power University) · Qingji Guan (Beijing Jiaotong University) · Ruoxi Deng (Wenzhou University) · Haibin Ling (State University of New York, Stony Brook)
Efficient Multitask Dense Predictor via Binarization
Yuzhang Shang (Illinois Institute of Technology) · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology) · Gaowen Liu (None) · Ramana Kompella (Cisco) · Yan Yan (Illinois Institute of Technology)
Novel View Synthesis with View-Dependent Effects from a Single Image
Juan Luis Gonzalez Bello (KAIST) · Munchurl Kim (Korea Advanced Institute of Science and Technology)
Wired Perspectives: Multi-View Wire Art Embraces Generative AI
Zhiyu Qu (University of Surrey) · LAN YANG (Beijing University of Posts and Telecommunications) · Honggang Zhang (Beijing University of Posts and Telecommunications) · Tao Xiang (University of Surrey) · Kaiyue Pang (SketchX AI) · Yi-Zhe Song (None)
Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation
Hongwei Yan (Tsinghua University) · Liyuan Wang (Tsinghua University) · Kaisheng Ma (Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University) · Yi Zhong (Tsinghua University, Tsinghua University)
Small Scale Data-Free Knowledge Distillation
He Liu (None) · Yikai Wang (Tsinghua University) · Huaping Liu (Tsinghua University, Tsinghua University) · Fuchun Sun (Tsinghua University) · Anbang Yao (Intel)
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features
Andre Rochow (University of Bonn) · Max Schwarz (University of Bonn) · Sven Behnke (University of Bonn)
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
Yuqi Wang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Yuntao Chen (CAIR, HKISI, CAS) · Xingyu Liao (University of Science and Technology of China) · Lue Fan (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Cheeun Hong (Seoul National University) · Kyoung Mu Lee (Seoul National University)
Domain Separation Graph Neural Networks for Saliency Object Ranking
Zijian Wu (Nanjing University of Science and Technology) · Jun Lu (Nanjing University of Science and Technology) · Jing Han (Nanjing University Of Science And Technology) · Lianfa Bai (Nanjing University of Science and Technology) · Yi Zhang (Nanjing University of Science and Technology) · Zhuang Zhao (Nanjing University of Science and Technology) · Siyang Song (University of Leicester)
Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
Xinzi Cao (Sun Yat-Sen University) · Xiawu Zheng (Xiamen University) · Guanhong Wang (Zhejiang University) · Weijiang Yu (SUN YAT-SEN UNIVERSITY) · Yunhang Shen (Tencent) · Ke Li (Tencent) · Yutong Lu (SUN YAT-SEN UNIVERSITY) · Yonghong Tian (Peking University)
Improving Image Restoration through Removing Degradations in Textual Representations
Jingbo Lin (Harbin Institute of Technology) · Zhilu Zhang (Harbin Institute of Technology) · Yuxiang Wei (The Hong Kong Polytechnic University, Hong Kong Polytechnic University) · Dongwei Ren (Harbin Institute of Technology) · Dongsheng Jiang (Huawei Technologies Ltd.) · Qi Tian (Huawei Technologies Ltd.) · Wangmeng Zuo (Harbin Institute of Technology)
Activity-Biometrics: Person Identification from Daily Activities
Shehreen Azad (University of Central Florida) · Yogesh S. Rawat (University of Central Florida)
Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation
Ming Xu (The Australian National University) · Stephen Gould (Australian National University)
Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation
Shenshen Bu (Sun Yat-sen University) · Taiji Li (SUN YAT-SEN UNIVERSITY) · Zhiming Dai (SUN YAT-SEN UNIVERSITY) · Yuedong Yang (SUN YAT-SEN UNIVERSITY)
HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation
Zhiying Leng (Beihang University) · Tolga Birdal (Imperial College London) · Xiaohui Liang (Zhongguancun Laboratory) · Federico Tombari (Google, TUM)
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images
Junwen Huang (Technische Universität München) · Hao Yu (Technical University Munich) · Kuan-Ting Yu (XYZ Robotics) · Nassir Navab (TU Munich) · Slobodan Ilic (Technical University Munich) · Benjamin Busam (Technical University of Munich)
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Yifei Liu (South China University of Technology) · Qiong Cao (JD Explore Academy) · Yandong Wen (Max Planck Institute for Intelligent Systems) · Huaiguang Jiang (South China University of Technology) · Changxing Ding (South China University of Technology)
Fast ODE-based Sampling for Diffusion Models in Around 5 Steps
Zhenyu Zhou (Zhejiang University) · Defang Chen (Zhejiang University) · Can Wang (Zhejiang University) · Chun Chen (Zhejiang University)
Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models
Huimin Huang (Zhejiang University) · Yawen Huang (None) · Lanfen Lin (Zhejiang University) · Ruofeng Tong (None) · Yen-Wei Chen (Ritsumeikan University) · Hao Zheng (Tencent) · Yuexiang Li (Tencent Jarvis Lab) · Yefeng Zheng (None)
WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concept
Yong Hyun Ahn (Kyung Hee University) · Hyeon Bae Kim (Kyung Hee University) · Seong Tae Kim (Kyung Hee University)
ToonerGAN: Reinforcing GANs for Obfuscating Automated Facial Indexing
Kartik Thakral (IIT Jodhpur) · Shashikant Prasad (Indian Institute of Technology, Jodhpur, Dhirubhai Ambani Institute Of Information and Communication Technology) · Stuti Aswani (Indian Institute of Technology, Jodhpur) · Mayank Vatsa (IIT Jodhpur) · Richa Singh (IIT Jodhpur)
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation
Wenxuan Wang (National Lab of Pattern Recognition, Institute of Automation,Chinese Academy of Sciences) · Tongtian Yue (, Institute of automation, Chinese academy of science) · Yisi Zhang (University of Science and Technology Beijing) · Longteng Guo (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xingjian He (, Institute of automation, Chinese academy of science) · Xinlong Wang (Beijing Academy of Artificial Intelligence) · Jing Liu (Institute of automation, Chinese academy of science)
Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
Junuk Cha (UNIST) · Jihyeon Kim (Ulsan National Institute of Science and Technology) · Jae Shin Yoon (Adobe Systems) · Seungryul Baek (UNIST)
Video Frame Interpolation via Direct Synthesis with the Event-based Reference
Yuhan Liu () · Yongjian Deng (Beijing University of Technology) · Hao Chen (Southeast University) · Zhen Yang (Beijing University of Technology)
Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation
Xiao Lin (University of Science and Technology of China) · Wenfei Yang (University of Science and Technology of China) · Yuan Gao (University of Science and Technology of China) · Tianzhu Zhang (University of Science and Technology of China)
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
Bo-Yuan Sun (Nankai University) · Yuqi Yang (Nankai University) · Le Zhang (University of Electronic Science and Technology of China) · Ming-Ming Cheng (Nankai University, Tsinghua University) · Qibin Hou (Nankai University)
MCNet: Rethinking the Core Ingredients for Accurate and Efficient Homography Estimation
Haokai Zhu (Zhejiang University) · Si-Yuan Cao (Zhejiang University) · Jianxin Hu (Zhejiang University) · Sitong Zuo (Beijing University of Posts and Telecommunications) · Beinan Yu (Zhejiang University) · Jiacheng Ying (Zhejiang University) · Junwei Li (Zhejiang University) · Hui-Liang Shen (None)
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu (Carnegie Mellon University) · Jiacheng Zhu (Massachusetts Institute of Technology) · William Han (Carnegie Mellon University) · Aditesh Kumar (Carnegie Mellon University) · Karthik Mittal (School of Computer Science, Carnegie Mellon University) · Claire Jin (School of Computer Science, Carnegie Mellon University) · Zhengyuan Yang (Microsoft) · Linjie Li (Microsoft) · Jianfeng Wang (Microsoft) · DING ZHAO (Carnegie Mellon University) · Bo Li (UIUC) · Lijuan Wang (Microsoft)
Open-Set Domain Adaptation for Semantic Segmentation
Seun-An Choe (Kyung Hee University) · Ah-Hyung Shin (Kyung Hee University) · Keon Hee Park (Kyung Hee University) · Jinwoo Choi (Kyung Hee University) · Gyeong-Moon Park (Kyung Hee University)
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen (Harbin Institute of Technology) · Leyang Shen (Harbin Institute of Technology) · Rui Shao (Harbin Institute of Technology) · Xiang Deng (Harbin Institute of Technology (Shenzhen)) · Liqiang Nie (Harbin Institute of Technology (Shenzhen))
Pixel Aligned Language Models
Jiarui Xu (University of California, San Diego) · Xingyi Zhou (Google) · Shen Yan (Google Research) · Xiuye Gu (None) · Anurag Arnab (Google) · Chen Sun (Brown University) · Xiaolong Wang (UCSD) · Cordelia Schmid (Inria / Google)
Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection
Zhiyuan Yan (Tencent YouTu Lab) · Yuhao Luo (The Chinese University of Hong Kong, Shenzhen) · Siwei Lyu (State University of New York, Buffalo) · Qingshan Liu (Nanjing University of Posts and Telecommunications) · Baoyuan Wu (The Chinese University of Hong Kong, Shenzhen)
Rethinking the Evaluation Protocol of Domain Generalization
Han Yu (Tsinghua University) · Xingxuan Zhang (Tsinghua University) · Renzhe Xu (Tsinghua University) · Jiashuo Liu (Tsinghua University, Tsinghua University) · Yue He (Tsinghua University, Tsinghua University) · Peng Cui (Tsinghua University, Tsinghua University)
PFStorer: Personalized Face Restoration and Super-Resolution
Tuomas Varanka (University of Oulu) · Tapani Toivonen (Huawei Technologies Ltd.) · Soumya Tripathy (Huawei Technologies Ltd. Finland) · Guoying Zhao (None) · Erman Acar (Huawei Technologies)
Adapters Strike Back
Jan-Martin Steitz (None) · Stefan Roth (None)
Eclipse: Disambiguating Illumination and Materials using Unintended Shadows
Dor Verbin (None) · Ben Mildenhall (Google) · Peter Hedman (Google) · Jonathan T. Barron (Google) · Todd Zickler (Harvard University) · Pratul P. Srinivasan (Google Research)
ASAM: Boosting Segment Anything Model with Adversarial Tuning
Bo Li (vivo Mobile Communication Co.,Ltd.) · Haoke Xiao (Xiamen University) · Lv Tang (vivo Mobile Communication Co., Ltd)
ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis
Muhammad Hamza Mughal (Max-Planck Institute for Informatics) · Rishabh Dabral (Saarland Informatics Campus, Max-Planck Institute) · Ikhsanul Habibie (Saarland Informatics Campus, Max-Planck Institute) · Lucia Donatelli (Vrije Universiteit Amsterdam) · Marc Habermann (Saarland Informatics Campus, Max-Planck Institute) · Christian Theobalt (MPI Informatik)
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Bowen Wen (NVIDIA) · Wei Yang (NVIDIA) · Jan Kautz (NVIDIA) · Stan Birchfield (NVIDIA)
Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach
Beichen Zhang (Shanghai Jiaotong University) · Xiaoxing Wang (Shanghai Jiao Tong University) · Xiaohan Qin (None) · Junchi Yan (Shanghai Jiao Tong University)
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
Xu Yang (South China University of Technology) · Changxing Ding (South China University of Technology) · Zhibin Hong (HeyGen) · Junhao Huang (HeyGen) · Jin Tao (South China University of Technology) · Xiangmin Xu (South China University of Technology)
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su (Zhejiang University) · Peihan Miao (Zhejiang University) · Huanzhang Dou (Zhejiang University) · Xi Li (Zhejiang University)
Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
Junshu Tang (None) · Yanhong Zeng (None) · Ke Fan (Shanghai Jiaotong University) · Xuheng Wang (Tsinghua University, Tsinghua University) · Bo Dai (Shanghai AI Laboratory) · Kai Chen (Shanghai AI Laboratory) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
Exploiting Diffusion Prior for Generalizable Dense Prediction
Hsin-Ying Lee (University of California, Merced) · Hung-Yu Tseng (Meta) · Hsin-Ying Lee (Snap Inc.) · Ming-Hsuan Yang (University of California at Merced)
GSVA: Generalized Segmentation via Multimodal Large Language Models
Zhuofan Xia (Tsinghua University) · Dongchen Han (Tsinghua University) · Yizeng Han (Tsinghua University, Tsinghua University) · Xuran Pan (Tsinghua University, Tsinghua University) · Shiji Song (Tsinghua University, Tsinghua University) · Gao Huang (Tsinghua University, Tsinghua University)
ElasticDiffusion: Training-free Arbitrary Size Image Generation
Moayed Haji Ali (Rice University) · Guha Balakrishnan (Rice University) · Vicente Ordonez (Rice University)
Uncertainty Visualization via Low-Dimensional Posterior Projections
Omer Yair (Technion) · Tomer Michaeli (Technion) · Elias Nehme (Electrical Engineering Department, Technion – Israel Institute of Technology, Technion - Israel Institute of Technology)
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Young Kyun Jang (Meta AI) · Donghyun Kim (Korea University) · Zihang Meng (Meta) · Dat Huynh (Meta) · Ser-Nam Lim (Meta AI)
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu (University of California, San Diego) · Saining Xie (Facebook)
Real-Time Neural BRDF with Spherically Distributed Primitives
Yishun Dou (Huawei) · Zhong Zheng (huawei.com) · Qiaoqiao Jin (Shanghai Jiao Tong University) · Bingbing Ni (Shanghai Jiao Tong University) · Yugang Chen (Hisilicon) · Junxiang Ke (Huawei Technologies Ltd.)
RCL: Reliable Continual Learning for Unified Failure Detection
Fei Zhu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Zhen Cheng (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xu-Yao Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Cheng-Lin Liu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara (Georgia Institute of Technology) · Bariscan Kurtkaya (Koc University) · Hidir Yesiltepe (Virginia Polytechnic Institute and State University) · James Rehg (None) · Pinar Yanardag (Virginia Polytechnic Institute and State University)
Geometry Transfer for Stylizing Radiance Fields
Hyunyoung Jung (Seoul National University) · Seonghyeon Nam (Facebook) · Nikolaos Sarafianos (Meta Reality Labs) · Sungjoo Yoo (None) · Alexander Sorkine-Hornung (Meta) · Rakesh Ranjan ()
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace (SalesForce.com) · Meihua Dang (Stanford University) · Rafael Rafailov (Stanford University) · Linqi Zhou (Stanford University) · Aaron Lou (Stanford University) · Senthil Purushwalkam (None) · Stefano Ermon (Stanford University) · Caiming Xiong (Salesforce Research) · Shafiq Joty (SalesForce.com) · Nikhil Naik (MIT)
CSTA: CNN-based Spatiotemporal Attention for Video Summarization
Jaewon Son (Sungkyunkwan University) · Jaehun Park (Sung Kyun Kwan University) · Kwangsu Kim (Department of Computer Science & Engineering, College of Computing, Sungkyunkwan University)
Sieve: Multimodal Dataset Pruning using Image-Captioning Models
Anas Mahmoud (University of Toronto) · Mostafa Elhoushi (Meta, FAIR) · Amro Abbas (Meta) · Yu Yang (University of California, Los Angeles) · Newsha Ardalani (Facebook) · Hugh Leather (Facebook) · Ari Morcos (Meta AI (FAIR))
AMU-Tuning: Learning Effective Bias for CLIP-based Few-shot Classification
Yuwei Tang (Tianjin University) · ZhenYi Lin (TianJin University) · Qilong Wang (university of tianjin of china) · Pengfei Zhu (Tianjin University) · Qinghua Hu (Tianjin University)
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
Song Wang (Zhejiang University) · Jiawei Yu (Zhejiang University) · Wentong Li (College of Computer Science and Technology, Zhejiang University) · Wenyu Liu (Zhejiang University) · Xiaolu Liu (Zhejiang University) · Junbo Chen (UDEER AI PTE.LTD) · Jianke Zhu (Zhejiang University)
Towards Fairness-Aware Adversarial Learning
Yanghao Zhang (University of Liverpool) · Tianle Zhang (University of Liverpool) · Ronghui Mu (Lancaster University) · Xiaowei Huang (University of Liverpool) · Wenjie Ruan (University of Exeter)
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu (None) · Yifei Huang (The University of Tokyo) · Junlin Hou (Hong Kong University of Science and Technology) · Guo Chen (Nanjing University) · Yuejie Zhang (Fudan University) · Rui Feng (Fudan University) · Weidi Xie (Shanghai Jiaotong University)
Low-Rank Knowledge Decomposition for Medical Foundation Models
Yuhang Zhou () · Haolin li (Fudan University) · Siyuan Du (Fudan University) · Jiangchao Yao (Shanghai Jiaotong University) · Ya Zhang (Shanghai Jiao Tong University) · Yanfeng Wang (Shanghai Jiao Tong University)
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
Shivangi Aneja (Technical University of Munich) · Justus Thies (Max-Planck Institute for Intelligent Systems) · Angela Dai () · Matthias Nießner (Technical University of Munich)
Pixel-level Semantic Correspondence through Layout-aware Representation Learning and Multi-scale Matching Integration
Yixuan Sun (Fudan University) · Zhangyue Yin (Fudan University) · Haibo Wang (None) · Yan Wang (Fudan University) · Xipeng Qiu (Fudan University) · Weifeng Ge (Fudan University) · Wenqiang Zhang (None)
CPR: Retrieval Augmented Generation for Copyright Protection
Aditya Golatkar (University of California, Los Angeles) · Alessandro Achille (California Institute of Technology) · Luca Zancato (AWS AI Labs) · Yu-Xiang Wang (UC Santa Barbara / Amazon) · Ashwin Swaminathan (University of Maryland, College Park) · Stefano Soatto (AWS)
Event-assisted Low-Light Video Object Segmentation
Li Hebei (University of Science and Technology of China) · Jin Wang (University of Science and Technology of China) · Jiahui Yuan (University of Science and Technology of China) · Yue Li (None) · Wenming Weng (None) · Yansong Peng (None) · Yueyi Zhang (University of Science and Technology of China) · Zhiwei Xiong (None) · Xiaoyan Sun (University of Science and Technology of China)
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng (Fudan University) · Sicheng Xie (Fudan University) · Zuyao You (Fudan University) · Shiyi Lan (NVIDIA CORPORATION) · Zuxuan Wu (Fudan University)
Animating General Image with Large Visual Motion Model
Dengsheng Chen (Meituan) · Xiaoming Wei (Meituan) · Xiaolin Wei (Meituan)
DeIl: Direct and Inverse CLIP for Open-World Few-Shot Learning
Shuai Shao (Zhejiang Lab) · Yu Bai (China University of Petroleum(East China)) · Yan WANG (Beihang University) · Bao-di Liu (China University of Petroleum (East China)) · Yicong Zhou (University of Macau)
FedAS: Bridging Inconsistency in Personalized Federated Learning
Xiyuan Yang (Wuhan University) · Wenke Huang (Wuhan University) · Mang Ye (Wuhan University)
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi (None) · Ye Fang (None) · Zeyi Sun (Shanghai Jiao Tong University) · Xiaoyang Wu (The University of Hong Kong) · Tong Wu (None) · Jiaqi Wang (Shanghai AI Laboratory) · Dahua Lin (The Chinese University of Hong Kong) · Hengshuang Zhao (The University of Hong Kong)
Scene Adaptive Sparse Transformer for Event-based Object Detection
Yansong Peng (None) · Li Hebei (University of Science and Technology of China) · Yueyi Zhang (University of Science and Technology of China) · Xiaoyan Sun (University of Science and Technology of China) · Feng Wu (University of Science and Technology of China)
Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
Leheng Zhang (University of Electronic Science and Technology of China) · Yawei Li (ETH Zurich) · Xingyu Zhou (University of Electronic Science and Technology of China) · Xiaorui Zhao (None) · Shuhang Gu (University of Electronic Science and Technology of China)
Rendering Every Pixel for High-Fidelity Geometry in 3D GANs
Alex Trevithick (None) · Matthew Chan (NVIDIA) · Towaki Takikawa (NVIDIA) · Umar Iqbal (None) · Shalini De Mello (NVIDIA Research) · Manmohan Chandraker (UC San Diego) · Ravi Ramamoorthi (None) · Koki Nagano (None)
Residual Learning in Diffusion Models
Junyu Zhang (Central South University) · Daochang Liu (University of Sydney) · Eunbyung Park (SKKU) · Shichao Zhang (Central South University) · Chang Xu (University of Sydney)
Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains
Bang-Dang Pham (VinAI Research) · Phong Tran (MBZUAI) · Anh Tran (VinAI Research) · Cuong Pham (Posts & Telecommunications Institute of Technology and VinAI Research) · Rang Nguyen (VinAI Research) · Minh Hoai (State University of New York, Stony Brook)
FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
Jiahui Zhang (Nanyang Technological University) · Fangneng Zhan (None) · MUYU XU (Nanyang Technological University) · Shijian Lu (Nanyang Technological University) · Eric P. Xing (Mohamed bin Zayed Univeristy of AI)
PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns
Shuliang Ning (The Chinese University of HongKong, ShenZhen) · Duomin Wang () · Yipeng Qin (Cardiff University) · Zirong Jin () · Baoyuan Wang (Xiaobing.ai) · Xiaoguang Han (The Chinese University of Hong Kong, Shenzhen)
Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation.
Dong Zhao (Xi'an University of Electronic Science and Technology) · Shuang Wang (Xidian University) · Qi Zang (Xidian University) · Licheng Jiao (Xidian University) · Nicu Sebe (University of Trento) · Zhun Zhong (University of Nottingham)
Revisiting Sampson Approximations for Geometric Estimation Problems
Felix Rydell (KTH Royal Institute of Technology) · Angelica Torres (Max Planck Institute for Mathematics in the Sciences) · Viktor Larsson (Lund University)
Neural 3D Strokes: Creating Stylized 3D Scenes with Vectorized 3D Strokes
Haobin Duan (Beihang University) · Miao Wang (Beihang University) · Yanxun Li (Buaa Software Engineering) · Yong-Liang Yang (University of Bath)
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
Junwen He (Dalian University of Technology) · Yifan Wang (Dalian University of Technology) · Lijun Wang (Dalian University of Technology) · Huchuan Lu (Dalian University of Technology) · Bin Luo (Alibaba Group) · Jun-Yan He (DAMO Academy, Alibaba Group) · Jin-Peng Lan (Alibaba Group) · Xuansong Xie (Alibaba Group)
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
Subhadeep Koley (University of Surrey) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Aneeshan Sain (University of Surrey) · Pinaki Nath Chowdhury (University of Surrey) · Tao Xiang (University of Surrey) · Yi-Zhe Song (None)
Flexible Depth Completion for Sparse and Varying Point Densities
Jinhyung Park (Carnegie Mellon University) · Yu-Jhe Li (Carnegie Mellon University) · Kris Kitani (Carnegie Mellon University)
Sparse Global Matching for Video Frame Interpolation with Large Motion
Chunxu Liu (Nanjing University) · Guozhen Zhang (Nanjing University) · Rui Zhao (Qing Yuan Research Institute, Shanghai Jiao Tong University) · Limin Wang (Nanjing University)
PIGEON: Predicting Image Geolocations
Lukas Haas (Stanford University) · Michal Skreta (Stanford University) · Silas Alberti (Stanford University) · Chelsea Finn (Stanford University)
Improving Generalization via Meta-Learning on Hard Samples
Nishant Jain (Indian Institute of Technology, Roorkee, Dhirubhai Ambani Institute Of Information and Communication Technology) · Arun Suggala (Google) · Pradeep Shenoy (Google)
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung (National Yang Ming Chiao Tung University) · 書緯 呂 (National Yang Ming Chiao Tung University) · Yi-Hsuan Tsai (Google) · Yi-Ting Chen (National Yang Ming Chiao Tung University)
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Shuyang Sun (University of Oxford) · Runjia Li (University of Oxford) · Philip H.S. Torr (University of Oxford) · Xiuye Gu (None) · Siyang Li (Google)
LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition
Zhonglin Sun (Queen Mary University of London) · Chen Feng (Queen Mary University of London) · Ioannis Patras (Queen Mary University of London) · Georgios Tzimiropoulos (Queen Mary University London)
SinSR: Diffusion-Based Image Super-Resolution in a Single Step
Yufei Wang (Nanyang Technological University) · Wenhan Yang (Peng Cheng Lab) · Xinyuan Chen (Shanghai Artificial Intelligence Laboratory) · Yaohui Wang (Shanghai AI Laboratory) · Lanqing Guo (Nanyang Technological University) · Lap-Pui Chau (The Hong Kong Polytechnic University) · Ziwei Liu (Nanyang Technological University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Alex C. Kot (Nanyang Technological University) · Bihan Wen (Nanyang Technological University)
Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning
Sicong Shen (Beihang University) · Yang Zhou (Beihang University) · Bingzheng Wei (Xiaomi Corporation) · Eric Chang (Massachusetts Institute of Technology) · Yan Xu (Beijing University of Aeronautics and Astronautics)
DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF
Jie Long Lee (None) · Chen Li (National University of Singapore) · Gim Hee Lee (National University of Singapore)
Relightable and Animatable Neural Avatar from Sparse-View Video
Zhen Xu (Zhejiang University) · Sida Peng (None) · Chen Geng (Stanford University) · Linzhan Mou (Zhejiang University) · Zihan Yan (University of Illinois Urbana-Champaign) · Jiaming Sun (Image Derivative Inc.) · Hujun Bao (Zhejiang University) · Xiaowei Zhou (None)
DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses
Chen Zhao (EPFL) · Tong Zhang (EPFL) · Zheng Dang (None) · Mathieu Salzmann (EPFL)
PostureHMR: Posture Transformation for 3D Human Mesh Recovery
Yu-Pei Song (Southwest Jiaotong University) · Xiao WU (Southwest Jiaotong University) · Zhaoquan Yuan (None) · Jian-Jun Qiao (Southwest Jiaotong University) · Qiang Peng (Southwest Jiaotong University)
VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Jiaqi Lin (Tsinghua University) · Zhihao Li (Huawei Noah's Ark Lab) · Xiao Tang (Huawei Technologies Ltd.) · Jianzhuang Liu (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Shiyong Liu (Huawei Noah's Ark Lab) · Jiayue Liu (Tsinghua University, Tsinghua University) · Yangdi Lu (Huawei Technologies Ltd.) · Xiaofei Wu (Huawei Technologies Ltd.) · Songcen Xu (Huawei Noah's Ark Lab) · Youliang Yan (Huawei Technologies Ltd.) · Wenming Yang (Tsinghua University,)
WANDR: Intention-guided Human Motion Generation
Markos Diomataris (None) · Nikos Athanasiou (None) · Omid Taheri () · Xi Wang (None) · Otmar Hilliges (None) · Michael J. Black (University of Tübingen)
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Tao Lu (Nanjing University) · Mulin Yu (Shanghai AI Laboratory) · Linning Xu (The Chinese University of Hong Kong) · Yuanbo Xiangli (None) · Limin Wang (Nanjing University) · Dahua Lin (The Chinese University of Hong Kong) · Bo Dai (Shanghai AI Laboratory)
SimDA: Simple Diffusion Adapter for Efficient Video Generation
Zhen Xing (Fudan University) · Qi Dai (Microsoft Research Asia) · Han Hu (Microsft Research Asia) · Zuxuan Wu (Fudan University) · Yu-Gang Jiang (Fudan University)
GART: Gaussian Articulated Template Models
Jiahui Lei (University of Pennsylvania) · Yufu Wang (University of Pennsylvania) · Georgios Pavlakos (University of Texas at Austin) · Lingjie Liu (Saarland Informatics Campus, Max-Planck Institute) · Kostas Daniilidis (University of Pennsylvania)
Learning from Observer Gaze: Zero-shot Attention Prediction Oriented by Human-Object Interaction Recognition
Yuchen Zhou (Sun Yat-Sen University) · Linkai Liu (Sun Yat-Sen University) · Chao Gou (Sun Yat-Sen University)
Anchor-based Robust Finetuning of Vision-Language Models
Jinwei Han (Wuhan University) · Zhiwen Lin (Tencent) · Zhongyisun Sun (Tencent Youtu Lab) · Yingguo Gao (Tencent Youtu Lab) · Ke Yan () · Shouhong Ding (Tencent Youtu Lab) · Yuan Gao (Wuhan University) · Gui-Song Xia (Wuhan University)
Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network
Aihua Mao (South China University of Technology) · Biao Yan (None) · Zijing Ma (South China University of Technology) · Ying He (Nanyang Technological University)
Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
Karran Pandey (University of Toronto) · Paul Guerrero (Adobe Systems) · Matheus Gadelha (Adobe Systems) · Yannick Hold-Geoffroy (Adobe Research) · Karan Singh (Department of Computer Science) · Niloy J. Mitra (University College London)
PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation
Xinqiao Zhao (Xi’an Jiaotong-Liverpool University) · Ziqian Yang (Xi'an Jiaotong-Liverpool University) · Tianhong Dai (University of Aberdeen) · Bingfeng Zhang (China University of Petroleum (East China)) · Jimin Xiao (Xi'an Jiaotong-Liverpool University)
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
Jisu Nam (Korea University) · Heesu Kim (NAVER) · DongJae Lee (KAIST) · Siyoon Jin (Korea University) · Seungryong Kim (Korea University) · Seunggyu Chang (NAVER Cloud)
COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction
Qihang Ma (East China Normal Universitry) · Xin Tan (East China Normal University) · Yanyun Qu (Xiamen University) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University) · Zhizhong Zhang (East China Normal University) · Yuan Xie (East China Normal University)
Generalizable Novel-View Synthesis using a Stereo Camera
Haechan Lee (Pohang University of Science and Technology) · Wonjoon Jin (Pohang University of Science and Technology) · Seung-Hwan Baek (POSTECH) · Sunghyun Cho (POSTECH)
Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection
Xiaohong Zhang () · Huisheng Ye (Nanjing University) · Jingwen Li (Nanjing University) · Qinyu Tang (Nanjing University) · Yuanqi Li (Nanjing University) · Yanwen Guo (Nanjing University) · Jie Guo (Nanjing University)
Language-driven All-in-one Adverse Weather Removal
Hao Yang (Beijing Institute of Technology) · Liyuan Pan (Beijing Institute of Technology) · Yan Yang (ANU) · Wei Liang (Beijing Institute of Technology)
Efficient Meshflow and Optical Flow Estimation from Event Cameras
Xinglong Luo (None) · Ao Luo (Megvii Technology Inc.) · Zhengning Wang (University of Electronic Science and Technology of China) · Chunyu Lin (Beijing Jiaotong University) · Bing Zeng (None) · Shuaicheng Liu (None)
Volumetric Environment Representation for Vision-Language Navigation
Liu (None) · Wenguan Wang (Zhejiang University) · Yi Yang (Zhejiang University)
LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis
Zehan Zheng (Tongji University) · Fan Lu (Tongji University) · Weiyi Xue (Tongji University) · Guang Chen (Tongji University) · Changjun Jiang (Tongji University)
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
Sanqing Qu (Tongji University) · Tianpei Zou (Tongji University) · Lianghua He (Tongji University) · Florian Röhrbein (Chemnitz University of Technology) · Alois Knoll (Technical University Munich) · Guang Chen (Tongji University) · Changjun Jiang (Tongji University)
CG-HOI: Contact-Guided 3D Human-Object Interaction Generation
Christian Diller (Technische Universität München) · Angela Dai ()
Contrastive Mean-Shift Learning for Generalized Category Discovery
Sua Choi (POSTECH) · Dahyun Kang (POSTECH) · Minsu Cho (POSTECH)
Federated Generalized Category Discovery
Nan Pu (University of Trento) · Wenjing Li (University of Science and Technology of China) · Xinyuan Ji (Leiden University) · Yalan Qin (Shanghai University) · Nicu Sebe (University of Trento) · Zhun Zhong (University of Nottingham)
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
Hanxin Zhu (University of Science and Technology of China) · Tianyu He (None) · Xin Li (None) · Bingchen Li (University of Science and Technology of China) · Zhibo Chen (University of Science and Technology of China)
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
Jiazuo Yu (Dalian University of Technology) · Yunzhi Zhuge (Dalian University of Technology) · Lu Zhang (Dalian University of Technology) · Ping Hu (University of Electronic Science and Technology of China) · Dong Wang (Dalian University of Technology) · Huchuan Lu (Dalian University of Technology) · You He (Tsinghua University, Tsinghua University)
How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
Subhadeep Koley (University of Surrey) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Aneeshan Sain (University of Surrey) · Pinaki Nath Chowdhury (University of Surrey) · Tao Xiang (University of Surrey) · Yi-Zhe Song (None)
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
Chong Mou (Peking University) · Xintao Wang (Tencent) · Jiechong Song (None) · Ying Shan (Tencent) · Jian Zhang (Peking University)
Iterated Learning Improves Compositionality in Large Vision-Language Models
Chenhao Zheng (University of Michigan) · Jieyu Zhang (Department of Computer Science, University of Washington) · Aniruddha Kembhavi (Allen Institute for Artificial Intelligence) · Ranjay Krishna (University of Washington)
Detours for Navigating Instructional Videos
Kumar Ashutosh (UT Austin & FAIR, Meta) · Zihui Xue (None) · Tushar Nagarajan (Meta) · Kristen Grauman (University of Texas at Austin)
Domain Gap Embeddings for Generative Dataset Augmentation
Yinong Wang (Carnegie Mellon University) · Younjoon Chung (Carnegie Mellon University) · Chen Henry Wu (Carnegie Mellon University) · Fernando De la Torre (Carnegie Mellon)
Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation
Zhekai Du (University of Electronic Science and Technology of China) · Xinyao Li (University of Electronic Science and Technology of China) · Fengling Li (University of Technology Sydney) · Ke Lu (University of Electronic Science and Technology of China) · Lei Zhu (Shandong Normal University) · Jingjing Li (University of Electronic Science and Technology of China)
TransLoc4D: Transformer-based 4D Radar Place Recognition
Guohao Peng (Nanyang Technological University) · Heshan Li (Nanyang Technological University) · Yangyang Zhao (Nanyang Technological University) · Jun Zhang (Nanyang Technological University) · Zhenyu Wu (Nanyang Technological University) · Pengyu Zheng (Chinese University of Hong Kong) · Danwei Wang (Nanyang Technological University)
Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
Jingyun Wang (None) · Guoliang Kang (Beihang University)
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
Sravanti Addepalli (Indian Institute of Science) · Ashish Asokan (Indian Institute of Science, Indian institute of science, Bangalore) · Lakshay Sharma (Indian Institute of Science, Indian institute of science, Bangalore) · R. Venkatesh Babu (Indian Institute of Science)
Towards Learning a Generalist Model for Embodied Navigation
Duo Zheng (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Shijia Huang (The Chinese University of Hong Kong) · Lin Zhao (Beijing Institute of Technology) · Yiwu Zhong (University of Wisconsin, Madison) · Liwei Wang (CUHK)
Small Steps and Level Sets: Fitting Neural Surface Models with Point Guidance
Chamin Hewa Koneputugodage (Australian National University) · Yizhak Ben-Shabat (Technion, Israel Institute of Technology) · Dylan Campbell (Australian National University) · Stephen Gould (Australian National University)
Absolute Pose from One or Two Scaled and Oriented Features
Jonathan Ventura (None) · Zuzana Kukelova (Czech Technical University in Prague) · Torsten Sattler (Czech Technical University in Prague) · Daniel Barath (ETHZ - ETH Zurich)
\emph{RealCustom}: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
Mengqi Huang (University of Science and Technology of China) · Zhendong Mao (None) · Mingcong Liu (ByteDance Inc.) · Qian HE (Institute of Remote Sensing Application, Chinese Academic of Sciences) · Yongdong Zhang (University of Science and Technology of China)
Driving Everywhere with Large Language Model Policy Adaptation
Boyi Li (UC Berkeley / NVIDIA) · Yue Wang (Massachusetts Institute of Technology) · Jiageng Mao (CUHK) · Boris Ivanovic (NVIDIA) · Sushant Veer (NVIDIA) · Karen Leung (University of Washington) · Marco Pavone (NVIDIA)
SANeRF-HQ: Segment Anything for NeRF in High Quality
Yichen Liu (HKUST) · Benran Hu (The Hong Kong University of Science and Technology) · Chi-Keung Tang (The Hong Kong University of Science and Technology) · Yu-Wing Tai (None)
APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation
Weizhao He (None) · Yang Zhang (Shenzhen University) · Wei Zhuo (Shenzhen University) · Linlin Shen (None) · Jiaqi Yang (University of Nottingham) · Songhe Deng (None) · Liang Sun (Shenzhen University)
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
Beomyoung Kim (NAVER Cloud / KAIST) · Joonsang Yu (NAVER) · Sung Ju Hwang (Korea Advanced Institute of Science and Technology)
InstanceDiffusion: Instance-level Control for Image Generation
Xudong Wang (Electrical Engineering & Computer Science Department, University of California Berkeley) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Sai Saketh Rambhatla (Meta) · Rohit Girdhar (Meta) · Ishan Misra (Facebook)
Shadow Generation for Composite Image Using Diffusion Model
Qingyang Liu (Shanghai Jiao Tong University) · Junqi You (Shanghai Jiaotong University) · Jian-Ting Wang (Shanghai JiaoTong University) · Xinhao Tao (Shanghai Jiaotong University) · Bo Zhang (Shanghai Jiao Tong University) · Li Niu ()
DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
Hao Yan (Tianjin University) · Zhihui Ke (Tianjin University) · Xiaobo Zhou (Tianjin University) · Tie Qiu (Tianjin University) · Xidong Shi (Tianjin University) · DaDong Jiang (Tianjin University)
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation
Ganlong Zhao (University of Hong Kong) · Guanbin Li (Sun Yat-sen University) · Weikai Chen (Tencent America) · Yizhou Yu (The University of Hong Kong)
Rolling Shutter Correction with Intermediate Distortion Flow Estimation
Mingdeng Cao (The University of Tokyo) · Sidi Yang (Shenzhen International Graduate School, Tsinghua University) · Yujiu Yang (Tsinghua University) · Yinqiang Zheng (None)
Towards Transferable Targeted 3D Adversarial Attack in the Physical World
Yao Huang (Beihang University) · Yinpeng Dong (Tsinghua University) · Shouwei Ruan (None) · Xiao Yang (Tsinghua University, Tsinghua University) · Hang Su (Tsinghua University) · Xingxing Wei (None)
AnyDoor: Zero-shot Object-level Image Customization
Xi Chen (the University of Hong Kong, University of Hong Kong) · Lianghua Huang (Alibaba Group) · Yu Liu (Alibaba Group) · Yujun Shen (The Chinese University of Hong Kong) · Deli Zhao (Alibaba Group) · Hengshuang Zhao (The University of Hong Kong)
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
Gege Gao (ETH Zürich) · Weiyang Liu (University of Cambridge) · Anpei Chen (Department of Computer Science, ETHZ - ETH Zurich) · Andreas Geiger (University of Tübingen) · Bernhard Schölkopf (ELLIS Institute)
Revisiting Spatial-Frequency Information Integration from a Hierarchical Perspective for Panchromatic and Multi-Spectral Image Fusion
Jiangtong Tan (University of Science and Technology of China) · Jie Huang (University of Science and Technology of China) · Naishan Zheng (University of Science and Technology of China) · Man Zhou (University of Science and Technology of China) · Keyu Yan (University of Science and Technology of China) · Danfeng Hong (Chinese Academy of Sciences, Aerospace Information Research Institute) · Feng Zhao (University of Science and Technology of China)
3D Facial Expressions through Analysis-by-Neural-Synthesis
George Retsinas (None) · Panagiotis Filntisis (None) · Radek Danecek (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Victoria Abrevaya (None) · Anastasios Roussos (Foundation for Research and Technology - Hellas) · Timo Bolkart (Google) · Petros Maragos (National Technical University of Athens)
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang (Tsinghua University) · Yinpeng Dong (Tsinghua University) · Siyuan Zhang (None) · Tianzan Min (Tsinghua University, Tsinghua University) · Hang Su (Tsinghua University) · Jun Zhu (Tsinghua University)
Unified Language-driven Zero-shot Domain Adaptation
Senqiao Yang (Harbin Institute of Technology) · Zhuotao Tian (The Chinese University of Hong Kong) · Li Jiang (Max Planck Institute for Informatics) · Jiaya Jia (The Chinese University of Hong Kong)
Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
Jing Ma (None) · Xiang Xiang (Huazhong University of Science and Technology) · Ke Wang (Alibaba Group) · Yuchuan Wu (Alibaba Group) · Yongbin Li (Alibaba Group)
HomoFormer: Homogenized Transformer for Image Shadow Removal
Jie Xiao (University of Science and Technology of China) · Xueyang Fu (University of Science and Technology of China) · Yurui Zhu (University of Science and Technology of China) · Dong Li (University of Science and Technology of China) · Jie Huang (University of Science and Technology of China) · Kai Zhu (University of Science and Technology of China) · Zheng-Jun Zha (University of Science and Technology of China)
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi (None) · Svetlana Orlova (Eindhoven University of Technology) · Daan de Geus (Eindhoven University of Technology) · Gijs Dubbelman (Eindhoven University of Technology)
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
Yifan Wang (Zhejiang University) · Xingyi He (Zhejiang University) · Sida Peng (None) · Dongli Tan (Zhejiang University) · Xiaowei Zhou (None)
Language-guided Image Reflection Separation
Haofeng Zhong (Peking University) · Yuchen Hong (Peking University) · Shuchen Weng (Peking University) · Jinxiu Liang (Peking University) · Boxin Shi (Peking University)
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang (None) · Zhening Xing (Shanghai AI Laboratory) · Yanhong Zeng (None) · Youqing Fang (Anhui University) · Kai Chen (Shanghai AI Laboratory)
Motion Diversification Networks
Hee Jae Kim (Boston University, Boston University) · Eshed Ohn-Bar (Boston University, Boston University)
On the Scalability of Diffusion-based Text-to-Image Generation
Hao Li (AWS AI Labs) · Yang Zou (Amazon) · Ying Wang (Amazon) · Orchid Majumder (Amazon Web Services) · Yusheng Xie (Amazon) · R. Manmatha (Amazon) · Ashwin Swaminathan (University of Maryland, College Park) · Zhuowen Tu (University of California, San Diego) · Stefano Ermon (Stanford University) · Stefano Soatto (AWS)
BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation
Jiahao Lu (University of Science and Technology of China) · Jiacheng Deng (University of Science and Technology of China) · Tianzhu Zhang (University of Science and Technology of China)
Unlocking Pretrained Image Backbones for Semantic Image Synthesis
Tariq Berrada (Meta) · Jakob Verbeek (Meta AI) · camille couprie (Facebook) · Karteek Alahari (Inria)
HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D
Sangmin Woo (Korea Advanced Institute of Science & Technology) · byeongjun park () · Hyojun Go (Twelvelabs) · Jin-Young Kim (Yonsei University) · Changick Kim (Korea Advanced Institute of Science and Technology)
Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Semantic Segmentation
Jiafan Zhuang (Shantou University) · Zilei Wang (University of Science and Technology of China) · Yixin Zhang (University of Science and Technology of China) · Zhun Fan (Shantou University)
Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation
Jonas Herzog (None)
FreeU: Free Lunch in Diffusion U-Net
Chenyang Si (Nanyang Technological University Singapore) · Ziqi Huang (Nanyang Technological University) · Yuming Jiang (Nanyang Technological University) · Ziwei Liu (Nanyang Technological University)
From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers
Swaminathan Gurumurthy (School of Computer Science, Carnegie Mellon University) · Karnik Ram (Technische Universität München) · Bingqing Chen (Bosch) · Zachary Manchester (Carnegie Mellon University) · Zico Kolter (Carnegie Mellon University)
Image Restoration by Denoising Diffusion Models With Iteratively Preconditioned Guidance
Tomer Garber (Open University of Israel) · Tom Tirer (Bar-Ilan University)
Mean-Shift Feature Transformer
Takumi Kobayashi (National Institute of Advanced Industrial Science and Technology (AIST))
SFOD: Spiking Fusion Object Detector
Yimeng Fan (School of Microelectronics, Tianjin University) · Wei Zhang (None) · Changsong Liu (Tianjin University) · Mingyang Li (Tianjin University) · Wenrui Lu (Tianjin University)
RegionGPT: Towards Region Understanding Vision Language Model
Qiushan Guo (The University of Hong Kong) · Shalini De Mello (NVIDIA Research) · Danny Yin (NVIDIA) · Wonmin Byeon (NVIDIA) · Ka Chun Cheung (NVIDIA) · Yizhou Yu (The University of Hong Kong) · Ping Luo (The University of Hong Kong) · Sifei Liu (NVIDIA)
Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors
Ziqin Zhou (None) · Hai-Ming Xu (The University of Adelaide) · Yangyang Shu (None) · Lingqiao Liu (None)
Relational Matching for Weakly Semi-Supervised Oriented Object Detection
Wenhao Wu (City University of Hong Kong) · Hau San Wong (City University of Hong Kong) · Si Wu (South China University of Technology) · Tianyou Zhang (South China University of Technology)
JointSQ: Joint Sparsification-Quantization for Distributed Learning
Weiying Xie (None) · Haowei Li (None) · Ma Jitao (None) · Yunsong Li () · Jie Lei (Xi'an University of Electronic Science and Technology) · donglai Liu (Xi'an University of Electronic Science and Technology) · Leyuan Fang (None)
Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection
Wenjun Hui (None) · Zhenfeng Zhu (Beijing Jiaotong University) · Shuai Zheng (Beijing Jiaotong University) · Yao Zhao (Beijing Jiaotong University)
NICE: Neurogenesis Inspired Contextual Encoding for Replay-free Class Incremental Learning
Mustafa B Gurbuz (Georgia Institute of Technology) · Jean Moorman (Georgia Institute of Technology) · Constantine Dovrolis (Georgia Institute of Technology)
Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
Axel Barroso-Laguna (None) · Sowmya Munukutla (None) · Victor Adrian Prisacariu (None) · Eric Brachmann (None)
Learning for Transductive Threshold Calibration in Open-World Recognition
Qin ZHANG (Amazon) · DONGSHENG An (Amazon) · Tianjun Xiao (Amazon) · Tong He (Amazon Web Services) · Qingming Tang (Amazon, Alexa) · Ying Nian Wu (UCLA) · Joseph Tighe (Meta) · Yifan Xing (None)
LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation
Xuecan Wang (None) · Shibang Xiao (Beijing University of Aeronautics and Astronautics) · Xiaohui Liang (Zhongguancun Laboratory)
pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Ege Ozguroglu (Columbia University) · Ruoshi Liu (Columbia University) · Dídac Surís (Columbia University) · Dian Chen (Toyota Research Institute) · Achal Dave (None) · Pavel Tokmakov (Toyota Research Institute) · Carl Vondrick (Columbia University)
Navigate Beyond Shortcuts: Debiased Learning through the Lens of Neural Collapse
Yining Wang (Fudan University) · Junjie Sun (Fudan University) · Chenyue Wang (Fudan University) · Mi Zhang (Fudan University) · Min Yang (Fudan University)
SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching
Xinghui Li (University of Oxford) · Jingyi Lu (University of Hong Kong) · Kai Han (The University of Hong Kong) · Victor Adrian Prisacariu (None)
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
Jiakai Sun (Zhejiang University) · Han Jiao (Zhejiang University) · Guangyuan Li (Zhejiang University) · Zhanjie Zhang (Zhejiang University) · Lei Zhao (Zhejiang University) · Wei Xing (Zhejiang University)
TextCraftor: Your Text Encoder Can be Image Quality Controller
Yanyu Li (Northeastern University) · Xian Liu (The Chinese University of Hong Kong) · Anil Kag (Snap Inc.) · Ju Hu (Snap Inc.) · Yerlan Idelbayev (Snap Inc.) · Dhritiman Sagar (Snap Inc.) · Yanzhi Wang (Northeastern University) · Sergey Tulyakov (Snap Inc.) · Jian Ren (Snap Inc.)
3D Human Pose Perception from Egocentric Stereo Videos
Hiroyasu Akada (Max Planck Institute for Informatics) · Jian Wang (Max Planck Institute for Informatics) · Vladislav Golyanik (MPI for Informatics) · Christian Theobalt (MPI Informatik)
Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching
Shitong Shao (Southeast University) · Zeyuan Yin (Mohamed bin Zayed University of Artificial Intelligence) · Muxin Zhou (Mohamed bin Zayed University of Artificial Intelligence) · Xindong Zhang (The Hong Kong Polytechnic University, Hong Kong Polytechnic University) · Zhiqiang Shen (Mohamed bin Zayed University of Artificial Intelligence)
AAMDM: Accelerated Auto-regressive Motion Diffusion Model
Tianyu Li (Georgia Institute of Technology) · Calvin Zhuhan Qiao (University of British Columbia) · Ren Guanqiao (Beijing University of Aeronautics and Astronautics) · KangKang Yin (Simon Fraser University) · Sehoon Ha (Georgia Institute of Technology)
TexOct: Generating Textures of 3D Models with Octree-based Diffusion
Jialun Liu (Baidu) · Chenming Wu (None) · Xinqi Liu (Baidu Inc) · Xing Liu (Baidu) · Jinbo Wu (Baidu) · Haotian Peng (Baidu) · Chen Zhao (None) · Haocheng Feng (Baidu) · Jingtuo Liu (Baidu) · Errui Ding (Baidu Inc.)
OTE: Exploring Accurate Scene Text Recognition Using One Token
Jianjun Xu (University of Science and Technology of China) · Yuxin Wang (University of Science and Technology of China) · Hongtao Xie (University of Science and Technology of China) · Yongdong Zhang (University of Science and Technology of China)
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang (None) · Dongdong Chen (Microsoft Research) · Chong Luo (Microsoft Research Asia) · Bo He (None) · Lu Yuan (Microsoft) · Zuxuan Wu (Fudan University) · Yu-Gang Jiang (Fudan University)
Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
Biao Gong (Alibaba Group) · Siteng Huang (Zhejiang University & Westlake University) · Yutong Feng (Alibaba Group) · Shiwei Zhang (Alibaba Group) · Yuyuan Li (Zhejiang University) · Yu Liu (Alibaba Group)
$\mathsf{LQMFormer}$:~Language-aware Query Mask Transformer for Referring Image Segmentation
Nisarg Shah (Johns Hopkins University) · Vibashan VS (Johns Hopkins University) · Vishal M. Patel (Johns Hopkins University)
Latent Modulated Function for Computational Optimal Continuous Image Representation
Zongyao He (Sun Yat-sen University) · Zhi Jin (Sun Yat-sen University)
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Bingxin Ke (ETH Zurich) · Anton Obukhov (None) · Shengyu Huang (None) · Nando Metzger (ETH Zürich) · Rodrigo Caye Daudt (ETH Zurich) · Konrad Schindler (ETH Zurich)
LiDAR-based Person Re-identification
Wenxuan Guo (Tsinghua University) · Zhiyu Pan (Department of Automation, Tsinghua University) · Yingping Liang (None) · Ziheng Xi (Tsinghua University, Tsinghua University) · Zhi Chen Zhong (Tsinghua University, Tsinghua University) · Jianjiang Feng (Tsinghua University) · Jie Zhou (None)
Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification
Bin Yang (Wuhan University) · Jun Chen (Wuhan University) · Mang Ye (Wuhan University)
Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation
Sangyun Shin (University of Oxford) · Kaichen Zhou (Department of Computer Science, University of Oxford) · Madhu Vankadari (Department of Computer Science, University of Oxford) · Andrew Markham (University of Oxford) · Niki Trigoni (University of Oxford)
Neural Spline Fields for Burst Image Fusion and Layer Separation
Ilya Chugunov (Princeton University) · David Shustin (Princeton University) · Ruyu Yan (Princeton University) · Chenyang Lei (The Hong Kong University of Science and Technology) · Felix Heide (Department of Computer Science, Princeton University)
L2B: Learning to Bootstrap Robust Models for Combating Label Noise
Yuyin Zhou (UC Santa Cruz) · Xianhang li (University of California, Santa Cruz) · Fengze Liu (ByteDance) · Qingyue Wei (Stanford University) · Xuxi Chen (University of Texas at Austin) · Lequan Yu (The University of Hong Kong) · Cihang Xie (University of California, Santa Cruz) · Matthew P. Lungren (Microsoft) · Lei Xing (Stanford University)
Deep Video Inverse Tone Mapping Based on Temporal Clues
Yuyao Ye (Peking University) · Ning Zhang (None) · Yang Zhao (Hefei University of Technology) · Hongbin Cao (ByteDance) · Ronggang Wang (Peking University Shenzhen Graduate School)
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
Ziqiao Peng (Renmin University of China) · Wentao Hu (Beijing University of Posts and Telecommunications) · Yue Shi (Psyche AI Inc.) · Xiangyu Zhu (None) · Xiaomei Zhang (None) · Hao Zhao (Tsinghua University, Tsinghua University) · Jun He (Renmin University of China) · Hongyan Liu (Tsinghua University, Tsinghua University) · Zhaoxin Fan (Renmin University of China, Tsinghua University)
Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models
Samar Fares (None) · Karthik Nandakumar (Mohamed Bin Zayed University of Artificial Intelligence)
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi (Amazon) · Qi Dong (Amazon) · Luis Goncalves (California Institute of Technology) · Zhuowen Tu (University of California, San Diego) · Stefano Soatto (AWS)
Seeing the Unseen: Visual Common Sense for Semantic Placement
Ram Ramrakhya (None) · Aniruddha Kembhavi (Allen Institute for Artificial Intelligence) · Dhruv Batra (FAIR (Meta) and Georgia Tech) · Zsolt Kira (Georgia Institute of Technology) · Kuo-Hao Zeng (Allen Institute for Artificial Intelligence) · Luca Weihs (Allen Institute for Artificial Intelligence)
Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields
Haoyuan Wang (City University of Hong Kong) · Wenbo Hu (Tencent AI Lab) · Lei Zhu (City University of Hong Kong) · Rynson W.H. Lau (City University of Hong Kong)
3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation
Xingguang Zhong (Rheinische Friedrich-Wilhelms Universität Bonn) · Yue Pan (University of Bonn) · Cyrill Stachniss (University of Bonn) · Jens Behley (University of Bonn)
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Yuanhui Huang (Tsinghua University) · Wenzhao Zheng (Tsinghua University, Tsinghua University) · Borui Zhang (Tsinghua University, Tsinghua University) · Jie Zhou (None) · Jiwen Lu (Tsinghua University)
SUGAR: Pre-training 3D Visual Representation for Robotics
Shizhe Chen (INRIA) · Ricardo Garcia Pinel (INRIA) · Ivan Laptev (INRIA Paris) · Cordelia Schmid (Inria / Google)
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
Taoran Yi (Huazhong University of Science and Technology) · Jiemin Fang (Huawei Technologies Ltd.) · Junjie Wang (None) · Guanjun Wu (None) · Lingxi Xie (Huawei Technologies Ltd.) · Xiaopeng Zhang (Huawei Technologies Ltd.) · Wenyu Liu (Huazhong University of Science and Technology) · Qi Tian (Huawei Technologies Ltd.) · Xinggang Wang (Huazhong University of Science and Technology)
Active Generalized Category Discovery
Shijie Ma (Institute of Automation, Chinese Academy of Sciences) · Fei Zhu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Zhun Zhong (University of Nottingham) · Xu-Yao Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Cheng-Lin Liu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering
Shaowei Wang (Xi'an Jiaotong University) · Lingling Zhang (Xi'an Jiaotong University) · Longji Zhu (Xi'an Jiaotong University) · Tao Qin (Xi'an Jiaotong University) · Kim-Hui Yap (Nanyang Technological University) · Xinyu Zhang (None) · Jun Liu (Xi'an Jiaotong University)
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
Simone Peirone (Polytechnic Institute of Turin) · Francesca Pistilli (Polytechnic Institute of Turin) · Antonio Alliegro (Politecnico di Torino) · Giuseppe Averta (Polytechnic of Turin)
Compact 3D Gaussian Representation for Radiance Field
Joo Chan Lee (Sungkyunkwan University) · Daniel Rho (Korea Telecom Research) · Xiangyu Sun (None) · Jong Hwan Ko (Sungkyunkwan University (SKKU)) · Eunbyung Park (SKKU)
FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations
Christian Diller (Technische Universität München) · Thomas Funkhouser (Princeton University) · Angela Dai ()
FlowIE:Efficient Image Enhancement via Rectified Flow
Yixuan Zhu (Tsinghua University) · Wenliang Zhao (Automation, Tsinghua University, Tsinghua University) · Ao Li (Tsinghua University) · Yansong Tang (Tsinghua University) · Jie Zhou (None) · Jiwen Lu (Tsinghua University)
Combining Frame and GOP Embeddings for Neural Video Representation
Jens Eirik Saethre (ETH Zurich & Disney Research|Studios) · Roberto Azevedo (Disney Research, Disney) · Christopher Schroers (Disney Research|Studios, Disney)
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Qidong Huang (University of Science and Technology of China) · Xiaoyi Dong (Microsoft) · Pan Zhang (Shanghai Artificial Intelligence Laboratory) · Bin Wang (Shanghai AI Laboratory) · Conghui He (None) · Jiaqi Wang (Shanghai AI Laboratory) · Dahua Lin (The Chinese University of Hong Kong) · Weiming Zhang (University of Science and Technology of China) · Nenghai Yu (University of Science and Technology of China)
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
Yushi Huang (SenseTime) · Ruihao Gong (SenseTime) · Jing Liu () · Tianlong Chen (Massachusetts Institute of Technology) · Xianglong Liu (BUAA)
Not All Classes Stand on Same Embeddings: Calibrating a Semantic Distance with Metric Tensor
Jae Hyeon Park (Dongguk University) · Gyoomin Lee (Dongguk University) · Seunggi Park (Dongguk University) · Sung In Cho (Dongguk University)
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Shengqu Cai (ETH Zurich & Stanford University) · Duygu Ceylan (Adobe Systems) · Matheus Gadelha (Adobe Systems) · Chun-Hao P. Huang (Adobe Systems) · Tuanfeng Y. Wang (None) · Gordon Wetzstein (Stanford University)
Improving Out-of-Distribution Generalization in Graphs via Hierarchical Semantic Environments
Yinhua Piao (Seoul National University) · Sangseon Lee (Seoul National University) · Yijingxiu Lu (Seoul National University) · Sun Kim (Seoul National University, Seoul National University)
Towards Understanding and Improving Adversarial Robustness of Vision Transformers
Samyak Jain () · Tanima Dutta (IIT BHU)
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval
Fang Kaipeng (None) · Jingkuan Song (University of Electronic Science and Technology of China,) · Lianli Gao (University of Electronic Science and Technology of China, Tsinghua University) · Pengpeng Zeng (University of Electronic Science and Technology of China) · Zhi-Qi Cheng (Carnegie Mellon University) · Xiyao LI (Kuaishou Technology) · Heng Tao Shen (University of Electronic Science and Technology of China)
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting
Yankai Jiang (Shanghai Artificial Intelligence Laboratory) · Zhongzhen Huang (None) · Rongzhao Zhang (Shanghai Artificial Intelligence Laboratory) · Xiaofan Zhang (Shanghai Jiao Tong University) · Shaoting Zhang (University of North Carolina at Charlotte)
Improved Self-Training for Test-Time Adaptation
Jing Ma (None)
Structure-Aware Sparse-View X-ray 3D Reconstruction
Yuanhao Cai (Johns Hopkins University) · Jiahao Wang (Johns Hopkins University) · Alan L. Yuille (Johns Hopkins University) · Zongwei Zhou (Johns Hopkins University) · Angtian Wang (Johns Hopkins University)
LangSplat: 3D Language Gaussian Splatting
Minghan Qin (Tsinghua University) · Wanhua Li (Harvard University) · Jiawei ZHOU (Tsinghua University) · Haoqian Wang (Tsinghua University, Tsinghua University) · Hanspeter Pfister (Harvard University)
Retrieval-Augmented Embodied Agents
Yichen Zhu (Midea Group) · Zhicai Ou (AI Innovation Center, Midea Group) · Xiaofeng Mou (Midea Group) · Jian Tang (Midea Group)
Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining
Xiang Chen (Nanjing University of Science and Technology) · Jinshan Pan (Nanjing University of Science and Technology) · Jiangxin Dong (Nanjing University of Science and Technology)
Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation
Lin Long (Zhejiang University) · Haobo Wang (Zhejiang University) · Zhijie Jiang (Zhejiang University) · Lei Feng (Nanyang Technological University) · Chang Yao (Zhejiang University) · Gang Chen (College of Computer Science and Technology, Zhejiang University) · Junbo Zhao (Zhejiang University)
Contextrast: Contextual Contrastive Learning for Semantic Segmentation
Changki Sung (Korea Advanced Institute of Science & Technology) · Wanhee Kim (Korea Advanced Institute of Science & Technology) · Jungho An (Korea Advanced Institute of Science & Technology) · WooJu Lee (KAIST) · Hyungtae Lim (KAIST) · Hyun Myung (KAIST)
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data
Hanrong Ye () · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology)
Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis
FeiFan Xu (None) · Rui Li (Shantou University) · Si Wu (South China University of Technology) · Yong Xu (Peng Cheng Laboratory) · Hau San Wong (City University of Hong Kong)
MonoCD: Monocular 3D Object Detection with Complementary Depths
Longfei Yan (Huazhong University of Science and Technology) · Pei Yan (Huazhong University of Science and Technology) · Shengzhou Xiong (Huazhong University of Science and Technology) · Xuanyu Xiang (Huazhong University of Science and Technology) · Yihua Tan (Huazhong University of Science and Technology)
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
Yu Zeng (None) · Vishal M. Patel (Johns Hopkins University) · Haochen Wang (Toyota Technological Institute at Chicago) · Xun Huang (NVIDIA) · Ting-Chun Wang (NVIDIA) · Ming-Yu Liu (NVIDIA) · Yogesh Balaji (NVIDIA)
A Linear N-Point Solver for Line and Motion Estimation with Event Cameras
Ling Gao (ShanghaiTech University) · Daniel Gehrig (None) · Hang Su (None) · Davide Scaramuzza (University of Zurich) · Laurent Kneip (ShanghaiTech University)
Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction
Zilin Du (Nanyang Technological University) · Haoxin Li (Nanyang Technological University) · Xu Guo (Nanyang Technological University) · Boyang Li (Nanyang Technological University)
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Guanjun Wu (None) · Taoran Yi (Huazhong University of Science and Technology) · Jiemin Fang (Huawei Technologies Ltd.) · Lingxi Xie (Huawei Technologies Ltd.) · Xiaopeng Zhang (Huawei Technologies Ltd.) · Wei Wei (Huazhong University of Science and Technology) · Wenyu Liu (Huazhong University of Science and Technology) · Qi Tian (Huawei Technologies Ltd.) · Xinggang Wang (Huazhong University of Science and Technology)
Differentiable Information Bottleneck for Deterministic Multi-view Clustering
Xiaoqiang Yan () · Zhixiang Jin (Zhengzhou University) · Fengshou Han (Zhengzhou University) · Yangdong Ye (Zhengzhou University)
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
Antoine Guédon (Ecole des Ponts ParisTech) · Vincent Lepetit (Ecole des Ponts ParisTech)
R-Cyclic Diffuser: Reductive and Cyclic Latent Diffusion for 3D Clothed Human Digitalization
Kennard Chan (, A*STAR) · Fayao Liu (Institute for Infocomm Research, A*STAR) · Guosheng Lin (Nanyang Technological University) · Chuan-Sheng Foo (Centre for Frontier AI Research, A*STAR) · Weisi Lin (Nanyang Technological University)
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Zhang Li (None) · Biao Yang (Huazhong University of Science and Technology) · Qiang Liu (Kingsoft Office) · Zhiyin Ma (Huazhong University of Science and Technology) · Shuo Zhang (Huazhong University of Science and Technology) · Jingxu Yang (Kingsoft Office Corporation Limited) · Yabo Sun (Kingsoft Office) · Yuliang Liu (Huazhong University of Science and Technology) · Xiang Bai (Huazhong University of Science and Technology)
Zero-Reference Low-Light Enhancement via Physical Quadruple Priors
Wenjing Wang (Peking University) · Huan Yang (01.AI) · Jianlong Fu (Microsoft) · Jiaying Liu (Peking University)
Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective
Jinjing Zhao (The University of Sydney) · Fangyun Wei (None) · Chang Xu (University of Sydney)
DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion
Tom Van Wouwe (Stanford University) · Seunghwan Lee (Stanford University) · Antoine Falisse (Stanford University) · Scott Delp (Stanford University) · Karen Liu (Computer Science Department, Stanford University)
HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion
Jingbo Zhang (City University of Hong Kong) · Xiaoyu Li (Tencent AI Lab) · Qi Zhang (Tencent AI Lab) · Yan-Pei Cao (Tencent ARC Lab) · Ying Shan (Tencent) · Jing Liao (City University of Hong Kong)
CurveCloudNet: Processing Point Clouds with 1D Structure
Colton Stearns (None) · Alex Fu (Illumix) · Jiateng Liu (Department of Computer Science) · Jeong Joon Park (Stanford University) · Davis Rempe (NVIDIA) · Despoina Paschalidou (Stanford) · Leonidas Guibas (Stanford University)
Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach
Guoqiang Liang (Hong Kong University of Science and Technology) · Kanghao Chen (Hong Kong University of Science and Technology) · Hangyu Li (Hong Kong University of Science and Technology) · Yunfan Lu (Hong Kong University of Science and Technology(GuangZhou)) · Lin Wang (Hong Kong University of Science and Technology)
Learning Visual Prompt for Gait Recognition
Kang Ma (Beijing Institute of Technology) · Ying Fu (None) · Chunshui Cao (Watrix Technology) · Saihui Hou (Beijing Normal University) · Yongzhen Huang (Beijing Normal University) · Dezhi Zheng (None)
FCS: Feature Calibration and Separation for Non-Exemplar Class Incremental Learning
Qiwei Li (Peking University) · Yuxin Peng (Peking University) · Jiahuan Zhou (Peking University)
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Junhao Zheng (Xi'an Jiaotong University) · Chenhao Lin (Xi'an Jiaotong University) · Jiahao Sun (Xi'an Jiaotong University) · Zhengyu Zhao (Xi'an Jiaotong University) · Qian Li (Xi'an Jiaotong University) · Chao Shen (Xi’an Jiaotong University)
Discovering and Mitigating Visual Biases through Keyword Explanation
Younghyun Kim (KAIST) · Sangwoo Mo (University of Michigan) · Minkyu Kim (KRAFTON, Inc.) · Kyungmin Lee (Korea Advanced Institute of Science & Technology) · Jaeho Lee (POSTECH) · Jinwoo Shin (Korea Advanced Institute of Science and Technology)
XFibrosis: Explicit Vessel-Fiber Modeling for Fibrosis Staging from Liver Pathology Images
CHONG YIN (Hong Kong Baptist University) · Siqi Liu (Shenzhen Research Institute of Big Data) · Fei Lyu (Hong Kong Baptist University) · Jiahao Lu (Copenhagen University) · Sune Darkner (Copenhagen University) · Vincent Wong (The Chinese University of Hong Kong) · Pong C. Yuen (Hong Kong Baptist Unviersity)
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang (The University of Sydney, University of Sydney) · Kevin Lin (Microsoft) · Zhengyuan Yang (Microsoft) · Jianfeng Wang (Microsoft) · Linjie Li (Microsoft) · Chung-Ching Lin (Microsoft) · Zicheng Liu (Microsoft) · Lijuan Wang (Microsoft)
GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
Mustafa Munir (The University of Texas at Austin) · William Avery (None) · Md Mostafijur Rahman (University of Texas at Austin) · Radu Marculescu (University of Texas, Austin)
MoML: Online Meta Adaptation for 3D Human Motion Prediction
Xiaoning Sun (Nanjing University of Science and Technology) · Huaijiang Sun (Nanjing University of Science and Technology) · Bin Li (Nanjing University of Science and Technology) · Dong Wei (Nanjing University of Science and Technology) · Weiqing Li (Nanjing University of Science and Technology) · Jianfeng Lu (Nanjing University of Science and Technology)
Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation
Yi Zhang (Beihang University) · Meng-Hao Guo (Tsinghua University, Tsinghua University) · Miao Wang (Beihang University) · Shi-Min Hu (Tsinghua University, Tsinghua University)
Improving Graph Contrastive Learning via Adaptive Positive Sampling
Jiaming Zhuo (Hebei University of Technology) · Feiyang Qin (None) · Can Cui (Hebei University of Technology) · Kun Fu (Hebei University of Technology) · Bingxin Niu (Hebei University of Techonology) · Mengzhu Wang (Hebei University of Technology) · Yuanfang Guo (Beihang University) · Chuan Wang (institute of information engineering) · Zhen Wang (None) · Xiaochun Cao (SUN YAT-SEN UNIVERSITY) · Liang Yang (Hebei University of Technology)
VILA: On Pre-training for Visual Language Models
Ji Lin (Massachusetts Institute of Technology) · Danny Yin (NVIDIA) · Wei Ping (NVIDIA) · Pavlo Molchanov (NVIDIA) · Mohammad Shoeybi (NVIDIA) · Song Han (Massachusetts Institute of Technology)
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao (King Abdullah University of Science and Technology (KAUST)) · Shuming Liu (KAUST) · Karttikeya Mangalam (University of California Berkeley) · Guocheng Qian (KAUST) · Fatimah Zohra (King Abdullah University of Science and Technology) · Abdulmohsen Alghannam (University of Virginia, Charlottesville) · Jitendra Malik (University of California at Berkeley) · Bernard Ghanem (KAUST)
Vision-and-Language Navigation via Causal Learning
Liuyi Wang (Tongji University) · Zongtao He (Tongji University) · Ronghao Dang (Tongji University) · mengjiao shen (Tongji University) · Chengju Liu (Tongji University) · Qijun Chen (Tongji University)
A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?
Galadrielle Humblot-Renaux (Aalborg University) · Sergio Escalera (Computer Vision Center) · Thomas B. Moeslund (Aalborg University)
Learning with Structural Labels for Learning with Noisy Labels
Noo-ri Kim (Sungkyunkwan University) · Jin-Seop Lee (Sungkyunkwan University) · Jee-Hyong Lee (Sungkyunkwan University)
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang (Tongji University) · Xiaotong Zhai (University of Warwick) · Zhongkai Zhao (National University of Singapore) · Yongshuo Zong (School of Informatics, University of Edinburgh) · Xin Wen (The University of Hong Kong) · Bingchen Zhao (University of Edinburgh)
Bayesian Exploration of Pre-trained Models for Low-shot Image Classification
Yibo Miao (Shanghai Jiaotong University) · Yu lei (Shanghai Jiao Tong University) · Feng Zhou (Renmin University of China) · Zhijie Deng (Shanghai Jiaotong University)
PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment
Tianchen Deng (None) · Guole Shen (None) · Tong Qin (Shanghai Jiaotong University) · jianyu wang (Shanghai Jiao Tong University) · Wentao Zhao (Shanghai Jiao Tong University) · Jingchuan Wang (None) · Danwei Wang (Nanyang Technological University) · Weidong Chen (Shanghai Jiao Tong University)
RecDiffusion: Rectangling for Image Stitching with Diffusion Models
Tianhao Zhou (University of Electronic Science and Technology of China) · Li Haipeng (None) · Ziyi Wang (University of Electronic Science and Technology of China) · Ao Luo (Megvii Technology Inc.) · Chenlin Zhang (Moonshot AI, Ltd) · Jiajun Li (4Paradigm Technology Inc.) · Bing Zeng (None) · Shuaicheng Liu (None)
Incremental Nuclei Segmentation from Histopathological Images via Future-class Awareness and Compatibility-inspired Distillation
Huyong Wang (Shenzhen University) · Huisi Wu (Shenzhen University) · Jing Qin (Hong Kong Polytechnic University)
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
Jinseok Kim (KAIST / LG Electronics) · Tae-Kyun Kim (Imperial College London)
HashPoint: Accelerated Point Searching and Sampling for Neural Rendering
Jiahao Ma (Australian National University) · Miaomiao Liu (Australian National University) · David Ahmedt-Aristizabal (CSIRO) · Chuong Nguyen (None)
Three Pillars improving Vision Foundation Model Distillation for Lidar
Gilles Puy (valeo.ai) · Spyros Gidaris (Valeo.ai) · Alexandre Boulch (valeo.ai) · Oriane Siméoni (Valeo.ai) · Corentin Sautier (ENPC, Ecole Nationale des Ponts et Chausees) · Patrick Pérez (None) · Andrei Bursuc (valeo.ai) · Renaud Marlet (INRIA)
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Chen Tang (Tsinghua University) · Yuan Meng (Tsinghua University, Tsinghua University) · Jiacheng Jiang (Tsinghua University, Tsinghua University) · Shuzhao Xie (Tsinghua University, Tsinghua University) · Rongwei Lu (Tsinghua University, Tsinghua University) · Xinzhu Ma (University of Sydney) · Zhi Wang (SIGS, Tsinghua University) · Wenwu Zhu (Tsinghua University, Tsinghua University)
Model Inversion Robustness: Can Transfer Learning Help?
Sy-Tuyen Ho (Singapore University of Technology and Design) · Koh Jun Hao (Singapore University of Technology and Design) · Keshigeyan Chandrasegaran (Stanford University) · Ngoc-Bao Nguyen (Singapore University of Technology and Design) · Ngai-Man Cheung (Singapore University of Technology and Design)
Seamless Human Motion Composition with Blended Positional Encodings
German Barquero (Universitat de Barcelona) · Sergio Escalera (Computer Vision Center) · Cristina Palmero (Universitat de Barcelona)
Single Domain Generalization for Crowd Counting
Zhuoxuan Peng (The Hong Kong University of Science and Technology) · S.-H. Gary Chan (The Hong Kong University of Science and Technology)
Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking
Wei Cao (Technische Universität München) · Chang Luo (Technical University of Munich) · Biao Zhang (KAUST) · Matthias Nießner (Technical University of Munich) · Jiapeng Tang (Technische Universität München)
SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
Junyan Ye (SUN YAT-SEN UNIVERSITY) · Qiyan Luo () · Jinhua Yu (Sun Yat-sen University, School of Geospatial Engineering and Science) · Huaping Zhong (SenseTime) · Zhimeng Zheng (Zhejiang University) · Conghui He (None) · Weijia Li (Sun Yat-sen University)
GLaMM: Pixel Grounding Large Multimodal Model
Hanoona Rasheed (Mohamed bin Zayed University of Artificial Intelligence) · Muhammad Maaz (Mohamed Bin Zayed University of Artificial Intelligence) · Sahal Shaji Mullappilly (Mohamed bin Zayed University of Artificial Intelligence) · Abdelrahman Shaker (Mohamed Bin Zayed University of Artificial Intelligence) · Salman Khan (Mohamed bin Zayed University of Artificial Intelligence) · Hisham Cholakkal (MBZUAI) · Rao Anwer (Mohamed bin Zayed University of Artificial Intelligence) · Eric P. Xing (Mohamed bin Zayed Univeristy of AI) · Ming-Hsuan Yang (University of California at Merced) · Fahad Shahbaz Khan (MBZUAI; Linköping University)
Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners
Keon Hee Park (Kyung Hee University) · Kyungwoo Song (Yonsei University) · Gyeong-Moon Park (Kyung Hee University)
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Mingxuan Liu (University of Trento) · Tyler Hayes (Naver Labs Europe) · Elisa Ricci (University of Trento) · Gabriela Csurka (None) · Riccardo Volpi (Naver Labs Europe)
Learning Large-Factor EM Image Super-Resolution with Generative Priors
Jiateng Shou (University of Science and Technology of China) · Zeyu Xiao (None) · Shiyu Deng (University of Science and Technology of China) · Wei Huang (University of Science and Technology of China) · ShiPeiyao (Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences) · Ruobing Zhang (Suzhou Institute of Biomedical Engineering and Technology) · Zhiwei Xiong (None) · Feng Wu (University of Science and Technology of China)
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
Yucheng Suo (Zhejiang University) · Fan Ma (None) · Linchao Zhu (None) · Yi Yang (Zhejiang University)
Functional Diffusion
Biao Zhang (KAUST) · Peter Wonka (KAUST)
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
Liao Wang () · Kaixin Yao (ShanghaiTech University) · Chengcheng Guo (ShanghaiTech University) · Zhirui Zhang (ShanghaiTech University) · Qiang Hu (Shanghai Jiaotong University) · Jingyi Yu (Shanghai Tech University) · Lan Xu (ShanghaiTech University) · Minye Wu (KU Leuven)
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Nataniel Ruiz (Boston University) · Yuanzhen Li (Massachusetts Institute of Technology) · Varun Jampani (Google Research) · Wei Wei (Google) · Tingbo Hou (Google Research) · Yael Pritch (Google Research) · Neal Wadhwa (Google) · Michael Rubinstein (Google) · Kfir Aberman (Google)
Novel Class Discovery for Ultra-Fine-Grained Visual Categorization
Yu Liu (None) · Yaqi Cai (Dalian University of Technology) · Qi Jia (Dalian University of Technology) · Binglin Qiu (Dalian University of Technology) · Weimin Wang (Dalian University of Techonoly) · Nan Pu (University of Trento)
Clustering Propagation for Universal Medical Image Segmentation
Yuhang Ding (University of Technology Sydney) · Liulei Li (Zhejiang University) · Wenguan Wang (Zhejiang University) · Yi Yang (Zhejiang University)
Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening
Yule Duan (University of Electronic Science and Technology of China) · Xiao Wu (University of Electronic Science and Technology of China) · Haoyu Deng (University of Electronic Science and Technology of China) · Liang-Jian Deng (University of Electronic Science and Technology of China)
A Versatile Framework for Continual Test-Time Domain Adaptation: Balancing Discriminability and Generalizability
Xu Yang (Xi'an University of Electronic Science and Technology) · Xuan chen (Xi'an University of Electronic Science and Technology) · Moqi Li (Xi'an University of Electronic Science and Technology) · Kun Wei (Xidian University) · Cheng Deng (Xidian University)
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Jiangpeng He (Purdue University)
Device-Wise Federated Network Pruning
Shangqian Gao (University of Pittsburgh) · Junyi Li (University of Maryland, College Park) · Zeyu Zhang (Amazon AGI) · Yanfu Zhang (College of William and Mary) · Weidong Cai (The University of Sydney) · Heng Huang (University of Pittsburgh)
D$^4$M: Dataset Distillation via Disentangled Diffusion Model
Duo Su (University of Chinese Academy of Sciences) · Junjie Hou (University of Chinese Academy of Sciences) · Weizhi Gao (North Carolina State University) · Yingjie Tian (, Chinese Academy of Sciences) · Bowen Tang (Huawei Technologies Ltd.)
Face2Diffusion for Fast and Editable Face Personalization
Kaede Shiohara (None) · Toshihiko Yamasaki (None)
Logarithmic Lenses: Exploring Log RGB Data for Image Classification
Bruce Maxwell (Northeastern University) · Sumegha Singhania (Northeastern University) · Avnish Patel (Northeastern University) · Rahul Kumar (Northeastern University) · Heather Fryling (Northeastern University) · Sihan Li (None) · Haonan Sun (Northeastern University) · Ping He (Northeastern University) · Zewen Li (Northeastern University)
Score-Guided Diffusion for 3D Human Recovery
Anastasis Stathopoulos (Rutgers University) · Ligong Han (Rutgers University) · Dimitris N. Metaxas (Rutgers)
Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion.
Weijian Ma (Fudan University) · Shuaiqi Chen (Fudan University) · Yunzhong Lou (Fudan University) · Xueyang Li (Fudan University) · Xiangdong Zhou (Fudan University)
StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation
Yining Shi (Tsinghua University) · Kun JIANG (Tsinghua University) · Ke Wang (Didi Research) · Jiusi Li (Tongji University) · Yunlong Wang (Tsinghua University) · Mengmeng Yang (None) · Diange Yang (Tsinghua University, Tsinghua University)
Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology
Andrew Song (Brigham and Women's hospital) · Richard J. Chen (Harvard University) · Tong Ding (Harvard University) · Drew F. K. Williamson (Massachusetts General Hospital, Harvard University) · Guillaume Jaume (Harvard University) · Faisal Mahmood (Harvard University)
Specularity Factorization for Low Light Enhancement
Saurabh Saini (International Institute of Information Technology Hyderabad) · P. J. Narayanan (IIIT Hyderabad)
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang (Institute of Computing Technology, Chinese Academy of Sciences) · Zhulin An (Institute of Computing Technology, Chinese Academy of Sciences) · Libo Huang (None) · Junyu Bi () · XinQiang Yu (Institute of Computing Technology, Chinese Academy of Sciences) · Han Yang (University of the Chinese Academy of Sciences) · boyu diao (None) · Yongjun Xu (None)
Enhance Image Classification Via Inter-Class Image Mixup With Diffusion Model
Zhicai Wang (Univerisity of Science and Technology of China) · Longhui Wei (Huawei Cloud Technologies Ltd.) · Tan Wang (Nanyang Technological University) · Heyu Chen (None) · Yanbin Hao () · Xiang Wang (University of Science and Technology of China) · Xiangnan He (University of Science and Technology of China) · Qi Tian (Huawei Technologies Ltd.)
CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning
Shiyu Tian (Chongqing University) · Hongxin Wei (Southern University of Science and Technology) · Yiqun Wang (Chongqing University) · Lei Feng (Nanyang Technological University)
Just Add $\pi$! Pose Induced Video Transformers for Understanding Activities of Daily Living
Dominick Reilly (UNC Charlotte) · Srijan Das (University of North Carolina at Charlotte)
3DInAction: Understanding Human Actions in 3D Point Clouds
Yizhak Ben-Shabat (Technion, Israel Institute of Technology) · Oren Shrout (Faculty of Electrical And Computer Engineering - Technion, Israel) · Stephen Gould (Australian National University)
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou (Computer Science, Tsinghua University, Tsinghua University) · Chao Yang (Shanghai AI Laboratory) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Chengbin Quan (Tsinghua University, Tsinghua University) · Youjian Zhao (Tsinghua University)
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
Shuting He (Nanyang Technological University) · Henghui Ding (Fudan University)
Embracing Unimodal Aleatoric Uncertainty for Robust Multimodal Fusion
Zixian Gao (University of Electronic Science and Technology of China) · Xun Jiang (University of Electronic Science and Technology of China) · Xing Xu (University of Electronic Science and Technology of China) · Fumin Shen (UESTC) · Yujie Li (Yangzhou University) · Heng Tao Shen (University of Electronic Science and Technology of China)
DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching
Shuzhe Wang (Aalto University) · Juho Kannala (University of Oulu) · Daniel Barath (ETHZ - ETH Zurich)
Multiplane Prior Guided Few-Shot Aerial Scene Rendering
Zihan Gao (Xidian University) · Licheng Jiao (Xi'an University of Electronic Science and Technology) · Lingling Li (Xidian University) · Xu Liu (Xidian University) · Fang Liu (Xidian University) · Puhua Chen () · Yuwei Guo (Xidian University)
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Zhen Xu (Zhejiang University) · Sida Peng (None) · Haotong Lin (None) · Guangzhao He (Zhejiang University) · Jiaming Sun (Image Derivative Inc.) · Yujun Shen (The Chinese University of Hong Kong) · Hujun Bao (Zhejiang University) · Xiaowei Zhou (None)
Context-Guided Spatio-Temporal Video Grounding
Xin Gu (None) · Heng Fan (University of North Texas) · Yan Huang (, University of North Texas) · Tiejian Luo (University of the Chinese Academy of Sciences) · Libo Zhang (Institute of Software Chinese Academy of Sciences)
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
Xin Zhou (Huazhong University of Science and Technology) · Dingkang Liang (Huazhong University of Science and Technology) · Wei Xu (Huazhong University of Science and Technology) · Xingkui Zhu (Huazhong University of Science and Technology) · Yihan Xu (Huazhong University of Science and Technology) · Zhikang Zou (Huazhong University of Science and Technology) · Xiang Bai (Huazhong University of Science and Technology)
Reconstruction-free Cascaded Adaptive Compressive Sensing
Chenxi Qiu (Nanjing University) · Tao Yue (Nanjing University) · Xuemei Hu (Nanjing University)
Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI
Chong Wang (Nanyang Technological University) · Lanqing Guo (Nanyang Technological University) · Yufei Wang (Nanyang Technological University) · Hao Cheng (Nanyang Technological University) · Yi Yu (Nanyang Technological University, Singapore) · Bihan Wen (Nanyang Technological University)
A Unified Approach for Text- and Image-guided 4D Scene Generation
Yufeng Zheng (ETH Zurich, MPI-IS) · Xueting Li (NVIDIA) · Koki Nagano (None) · Sifei Liu (NVIDIA) · Otmar Hilliges (None) · Shalini De Mello (NVIDIA Research)
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
Peter Kocsis (None) · Vincent Sitzmann (Massachusetts Institute of Technology) · Matthias Nießner (Technical University of Munich)
RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection
Ximiao Zhang (Capital Normal University) · Min Xu (Capital Normal University) · Xiuzhuang Zhou (Beijing University of Posts and Telecommunications)
Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
Zhengyue Zhao (University of Chinese Academy of Sciences) · Jinhao Duan (Drexel University) · Kaidi Xu (Drexel University) · Chenan Wang (Drexel University) · Rui Zhang (None) · Zidong Du (Institute of Computing Technology, Chinese Academy of Sciences) · Qi Guo (Institute of Computing Technology, Chinese Academy of Sciences) · Xing Hu (, Chinese Academy of Sciences)
NetTrack: Tracking Highly Dynamic Objects with a Net
Guangze Zheng (The University of Hong Kong) · Shijie Lin (None) · Haobo Zuo (University of Hong Kong) · Changhong Fu (Tongji University) · Jia Pan (University of Hong Kong)
CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment
Hyeongmin Lee (Pohang University of Science and Technology) · Kyoungkook Kang (Pohang University of Science and Technology) · Jungseul Ok (POSTECH) · Sunghyun Cho (POSTECH)
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Jianyuan Wang (Oxford VGG) · Nikita Karaev (University of Oxford) · Christian Rupprecht (University of Oxford) · David Novotny (Facebook)
CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing
Zhen Guo (Northwestern Polytechnical University) · Hongping Gan (Northwest Polytechnical University Xi'an)
Video Recognition in Portrait Mode
Mingfei Han (University of Technology Sydney) · Linjie Yang (ByteDance Inc.) · Xiaojie Jin (ByteDance Inc./TikTok) · Jiashi Feng (ByteDance) · Xiaojun Chang (University of Technology Sydney) · Heng Wang (Bytedance)
Versatile Navigation under Partial Observability via Value-Guided Diffusion Policy
Gengyu Zhang (Illinois Institute of Technology) · Hao Tang (ETH Zurich and CMU) · Yan Yan (Illinois Institute of Technology)
Point, Segment and Count: A Generalized Framework for Object Counting
Zhizhong Huang (Fudan University) · Mingliang Dai (Fudan University) · Yi Zhang (Sichuan University) · Junping Zhang (Fudan University) · Hongming Shan (None)
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mathilde Caron (Google) · Ahmet Iscen (Google) · Alireza Fathi (Google) · Cordelia Schmid (Inria / Google)
Normalizing Flows on the Product Space of SO(3) Manifolds for Probabilistic Human Pose Modeling
Olaf Dünkel (Saarland Informatics Campus, Max-Planck Institute) · Tim Salzmann (Technische Universität München) · Florian Pfaff (University of Stuttgart)
GEARS: Local Geometry-aware Hand-object Interaction Synthesis
Keyang Zhou (Eberhard-Karls-Universität Tübingen) · Bharat Lal Bhatnagar (Meta) · Jan Lenssen (Saarland Informatics Campus, Max-Planck Institute) · Gerard Pons-Moll (University of Tübingen)
GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
Junjie Wang (None) · Jiemin Fang (Huawei Technologies Ltd.) · Xiaopeng Zhang (Huawei Technologies Ltd.) · Lingxi Xie (Huawei Technologies Ltd.) · Qi Tian (Huawei Technologies Ltd.)
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
Yawar Siddiqui (Technical University Munich) · Antonio Alliegro (Politecnico di Torino) · Alexey Artemov (Technische Universität München) · Tatiana Tommasi (Politecnico di Torino) · Daniele Sirigatti (Audi AG) · Vladislav Rosov (AUDI AG) · Angela Dai () · Matthias Nießner (Technical University of Munich)
3D-Aware Face Editing via Warping-Guided Latent Direction Learning
Yuhao Cheng (Shanghai Jiaotong University) · Zhuo Chen (Shanghai Jiaotong University) · Xingyu Ren (Shanghai Jiao Tong University) · Wenhan Zhu (None) · Zhengqin Xu (Shanghai Jiaotong University) · Di Xu (Huawei Technologies Ltd.) · Yang Changpeng (Huawei Cloud) · Yichao Yan (Shanghai Jiao Tong University)
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
Geonho Bang (Hanyang University) · Kwangjin Choi (Hanyang University) · Jisong Kim (Hanyang University) · Dongsuk Kum (Korea Advanced Institute of Science and Technology) · Jun Won Choi (Seoul National University)
MatFuse: Controllable Material Generation with Diffusion Models
Giuseppe Vecchio (University of Catania) · Renato Sortino (University of Catania) · Simone Palazzo (University of Catania) · Concetto Spampinato (University of Catania)
Global Latent Neural Rendering
Thomas Tanay (Huawei Technologies Ltd.) · Matteo Maggioni (Huawei Technologies Ltd.)
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
Junyi Wu (None) · Bin Duan (Illinois Tech) · Weitai Kang (None) · Hao Tang (ETH Zurich and CMU) · Yan Yan (Illinois Institute of Technology)
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Felix Wimbauer (Technical University of Munich) · Bichen Wu (Facebook) · Edgar Schoenfeld (None) · Xiaoliang Dai (Facebook) · Ji Hou (Facebook) · Zijian He (None) · Artsiom Sanakoyeu (RL) · Peizhao Zhang (Facebook) · Sam Tsai (Meta) · Jonas Kohler (Facebook) · Christian Rupprecht (University of Oxford) · Daniel Cremers (Technical University Munich) · Peter Vajda (Facebook) · Jialiang Wang (Facebook)
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
Subhadeep Koley (University of Surrey) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Deeptanshu Sekhri (University of Surrey) · Aneeshan Sain (University of Surrey) · Pinaki Nath Chowdhury (University of Surrey) · Tao Xiang (University of Surrey) · Yi-Zhe Song (None)
ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images
Jinseo Jeong (Seoul National University) · Junseo Koo (Seoul National University) · Qimeng Zhang (Korea University) · Gunhee Kim (Seoul National University)
Epistemic Uncertainty Quantification For Pre-trained Neural Networks
Hanjing Wang (Rensselaer Polytechnic Institute) · Qiang Ji (Rensselaer Polytechnic Institute)
OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
Haiyang Ying (None) · Yixuan Yin (Tsinghua University) · Jinzhi Zhang (Electronic Engineering, Tsinghua University, Tsinghua University) · Fan Wang (Alibaba Group) · Tao Yu (Tsinghua University, Tsinghua University) · Ruqi Huang (Tsinghua Shenzhen International Graduate School/Tsinghua Berkeley Shenzhen Institute ) · Lu Fang (Tsinghua University, Tsinghua University)
MRFS: Mutually Reinforcing Image Fusion and Segmentation
Hao Zhang (Wuhan University) · Xuhui Zuo (Wuhan University) · Jie Jiang (Tencent AI Lab) · Chunchao Guo (SUN YAT-SEN UNIVERSITY) · Jiayi Ma (Wuhan University)
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
Dale Decatur (University of Chicago) · Itai Lang (University of Chicago & Tel Aviv University) · Kfir Aberman (Google) · Rana Hanocka (University of Chicago)
Design2Cloth: 3D Cloth Generation from 2D Masks
Jiali Zheng (Imperial College London) · Rolandos Alexandros Potamias (Imperial College London) · Stefanos Zafeiriou (Imperial College London)
3D-LFM: Lifting Foundation Model
Mosam Dabhi () · László A. Jeni (Carnegie Mellon University) · Simon Lucey (University of Adelaide)
Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It
Adam Lilja (None) · Junsheng Fu (Zenseact) · Erik Stenborg (Chalmers University) · Lars Hammarstrand (Chalmers University of Technology)
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Han Qiu (Nanyang Technological University) · Jiaxing Huang (Nanyang Technological University) · Peng Gao (The Chinese University of Hong Kong) · Lewei Lu (SenseTime) · Xiaoqin Zhang (Wenzhou University) · Shijian Lu (Nanyang Technological University)
UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather
Haimei Zhao (The University of Sydney) · Jing Zhang (The University of Sydney) · Zhuo Chen (Tsinghua University, Tsinghua University) · Shanshan Zhao (JD Explore Academy) · Dacheng Tao (None)
Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification
Tingting Zheng (Harbin Institute of Technology) · Kui Jiang (Harbin Institute of Technology) · Hongxun Yao (Harbin Institute of Technology)
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Renjie Pi (None) · Lewei Yao (Harbin Institute of Technology) · Jiahui Gao (The University of Hong Kong) · Jipeng Zhang (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology) · Tong Zhang (UIUC)
Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani (University of Michigan) · Amit Raj (Google ) · Kevis-kokitsi Maninis (Google) · Abhishek Kar (Google) · Yuanzhen Li (Massachusetts Institute of Technology) · Michael Rubinstein (Google) · Deqing Sun (Google) · Leonidas Guibas (Stanford University) · Justin Johnson (University of Michigan) · Varun Jampani (Google Research)
View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning
Shilong Ou (Beijing University of Posts and Telecommunications) · Zhe Xue (Beijing University of Posts and Telecommunications) · Yawen Li (Beijing University of Posts and Telecommunications) · Meiyu Liang (Beijing University of Posts and Telecommunications) · Yuanqiang Cai (Beijing University of Posts and Telecommunications) · junjiang wu (None)
PAD: Patch-Agnostic Defense against Adversarial Patch Attacks
Lihua Jing (University of the Chinese Academy of Sciences) · Rui Wang (Institute of Information Engineering) · Wenqi Ren (Sun Yat-Sen University) · Xin Dong (University of the Chinese Academy of Sciences) · Cong Zou (Institute of Information Engineering, CAS)
Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization
Guopeng Li (Wuhan University) · Ming Qian (None) · Gui-Song Xia (Wuhan University)
EasyDrag: Efficient Point-based Manipulation on Diffusion Models
Xingzhong Hou (University of Chinese Academy of Sciences) · Boxiao Liu (Sensetime Research) · Yi Zhang (The Chinese University of Hong Kong) · Jihao Liu (The Chinese University of Hong Kong) · Yu Liu (The Chinese University of Hong Kong) · Haihang You (Institute of Computing Technology, Chinese Academy of Sciences)
Generating Illustrated Instructions
Sachit Menon (Columbia University) · Ishan Misra (Facebook) · Rohit Girdhar (Meta)
LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation
Ke Guo (HKU) · Zhenwei Miao (Alibaba Group) · Wei Jing (NetEase, Inc.) · Weiwei Liu (Huzhou Institute of Zhejiang University) · Weizi Li (University of Tennessee, Knoxville) · Dayang Hao (Cainiao) · Jia Pan (University of Hong Kong)
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Ye Yuan (NVIDIA Research) · Xueting Li (NVIDIA) · Yangyi Huang (Zhejiang University) · Shalini De Mello (NVIDIA Research) · Koki Nagano (None) · Jan Kautz (NVIDIA) · Umar Iqbal (None)
TexTile: A Differentiable Metric for Texture Tileability
Carlos Rodriguez-Pardo (Politecnico di Milano) · Dan Casas (Universidad Rey Juan Carlos) · Elena Garces (Universidad Rey Juan Carlos) · Jorge Lopez-Moreno (Universidad Rey Juan Carlos)
Image Processing GNN: Breaking Rigidity in Super-Resolution
Yuchuan Tian (Peking University) · Hanting Chen (Huawei Technologies Ltd.) · Chao Xu (Peking University) · Yunhe Wang (Huawei Noah's Ark Lab)
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Jihan Yang (The University of Hong Kong) · Runyu Ding (Electrical and Electronic Engineering, University of Hong Kong) · Weipeng DENG (University of Hong Kong) · Zhe Wang (Sensetime Group Limited) · Xiaojuan Qi (University of Oxford)
LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
Hao Yang (Beijing Institute of Technology) · Liyuan Pan (Beijing Institute of Technology) · Yan Yang (ANU) · Richard Hartley (ANU / Google) · Miaomiao Liu (Australian National University)
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva (MPII) · Fadime Sener () · Edoardo Remelli (EPFL - EPF Lausanne) · Bugra Tekin (Meta) · Eric Sauser (Meta) · Bernt Schiele (Max Planck Institute for Informatics) · Shugao Ma (Meta)
Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation
Xiyi Chen (ETH Zürich) · Marko Mihajlovic (Swiss Federal Institute of Technology) · Shaofei Wang (None) · Sergey Prokudin (ETHZ - ETH Zurich) · Siyu Tang (ETH Zurich)
MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark
Sanghyun Woo (New York University) · Kwanyong Park ( Electronics and Telecommunication Research Institute) · Inkyu Shin (Korea Advanced Institute of Science & Technology) · Myungchul Kim (Korea Advanced Institute of Science & Technology) · In So Kweon (Korea Advanced Institute of Science and Technology)
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
Qihao Zhao (Beijing University of Chemical Technology) · Yalun Dai (Nanyang Technological University) · Hao Li (Northwest Polytechnical University) · Wei Hu (Beijing Univeristy of Chemical Technology) · Fan Zhang (Beijing University of Chemical Technology) · Jun Liu (Singapore University of Technology and Design (SUTD))
Riemannian Multinomial Logistics Regression for SPD Neural Networks
Ziheng Chen (University of Trento) · Yue Song (University of Trento) · Gaowen Liu (None) · Ramana Kompella (Cisco) · Xiaojun Wu (Jiangnan University) · Nicu Sebe (University of Trento)
Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer
Hyeongjin Nam (None) · Daniel Jung (Seoul National University) · Gyeongsik Moon (None) · Kyoung Mu Lee (Seoul National University)
Learned Scanpaths Aid Blind Panoramic Video Quality Assessment
Kanglong FAN (City University of Hong Kong) · Wen Wen (City University of Hong Kong) · Mu Li (The Chinese University of Hong Kong, Shenzhen) · YIFAN PENG (University of Hong Kong) · Kede Ma (City University of Hong Kong)
S$^2$MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering
Zhen Long (University of Electronic Science and Technology of China) · Qiyuan Wang (None) · Yazhou Ren (University of Electronic Science and Technology of China) · Yipeng Liu (University of Electronic Science and Technology of China) · Ce Zhu (University of Electronic Science and Technology of China)
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
Min Yang (None) · gaohuan (Inchitech Company) · Ping Guo (Intel) · Limin Wang (Nanjing University)
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun (Seoul National University) · Dohoon Kim (Seoul National University) · Taesup Moon (Seoul National University)
Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization
Khiem Le (University of Notre Dame) · Tuan Long Ho (VinUniversity) · Cuong Do (None) · Danh Le-Phuoc (TU Berlin) · KOK SENG WONG (VinUniversity)
Unsupervised Gaze Representation Learning from Multi-view Face Images
Yiwei Bao (Beihang University) · Feng Lu (Beihang University, Tsinghua University)
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Yash Jain (Microsoft) · Anshul Nasery (University of Washington) · Vibhav Vineet (Microsoft) · Harkirat Behl (Microsoft)
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
Zhaohe Liao (Shanghai Jiao Tong University) · Jiangtong Li (Shanghai Jiao Tong University) · Li Niu () · Liqing Zhang (Shanghai Jiao Tong University)
MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion
Roy Kapon (Tel Aviv University) · Guy Tevet (Tel Aviv University) · Daniel Cohen-Or (Google) · Amit H. Bermano (Tel Aviv University, Technion)
Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector
Yifu Ding (None) · Weilun Feng (Beijing University of Aeronautics and Astronautics) · Chuyan Chen (Beijing University of Aeronautics and Astronautics) · Jinyang Guo (Beijing University of Aeronautics and Astronautics) · Xianglong Liu (BUAA)
From Coarse to Fine-Grained Open-Set Recognition
Nico Lang (University of Copenhagen) · Vésteinn Snæbjarnarson (Copenhagen University) · Elijah Cole (Altos Labs) · Oisin Mac Aodha (University of Edinburgh) · Christian Igel (University of Copenhagen) · Serge Belongie (University of Copenhagen)
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer
Wei-Ting Chen (National Taiwan University) · Gurunandan Krishnan (Snap Inc.) · Qiang Gao (Snap Inc.) · Sy-Yen Kuo (National Taiwan University) · Sizhuo Ma (Snap Inc.) · Jian Wang (Snap Inc.)
Discriminative Pattern Calibration Mechanism for Source-Free Domain Adaptation
Haifeng Xia (Southeast University) · Siyu Xia (Southeast University) · Zhengming Ding (Tulane University)
RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control
xiang deng (tsinghua university) · Zerong Zheng (Tsinghua University) · Yuxiang Zhang (Tsinghua University, Tsinghua University) · Jingxiang Sun (None) · Chao Xu (NNCosmos) · Xiaodong Yang (Li Auto) · Lizhen Wang (Tsinghua University, Tsinghua University) · Yebin Liu (Tsinghua University)
Towards Generalizable Multi-Object Tracking
Zheng Qin () · Le Wang (Xi'an Jiaotong University) · Sanping Zhou (Xi'an Jiaotong University) · Panpan Fu (Xi'an Jiaotong University) · Gang Hua (Wormpex AI Research) · Wei Tang (University of Illinois, Chicago)
EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
Md Mostafijur Rahman (University of Texas at Austin) · Mustafa Munir (The University of Texas at Austin) · Radu Marculescu (University of Texas, Austin)
Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
Ziyi Yang (None) · Xinyu Gao (Zhejiang University) · Wen Zhou (University of Science and Technology of China) · Shaohui Jiao (Bytedance) · Yuqing Zhang (Zhejiang University) · Xiaogang Jin (State Key Lab of CAD&CG, Zhejiang University)
TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation
Sai Kumar Dwivedi (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Yu Sun (Harbin Institute of Technology) · Priyanka Patel (Max-Planck Institute) · Yao Feng (None) · Michael J. Black (University of Tübingen)
Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models
Bin Fu (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Fanghua Yu (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Anran Liu (None) · Zixuan Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Jie Wen (Harbin Institute of Technology, Shenzhen) · Junjun He (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory)
A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals
Jiangnan Tang (ShanghaiTech University) · Jingya Wang (ShanghaiTech University) · Kaiyang Ji (None) · Lan Xu (ShanghaiTech University) · Jingyi Yu (Shanghai Tech University) · Ye Shi (ShanghaiTech University)
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges
Tongtong Yuan (Beijing University of Technology) · Xuange Zhang (Beijing University Of Technology) · Kun Liu (Beijing University of Posts and Telecommunications) · Bo Liu (Beijing University of Technology) · Chen Chen () · Jian Jin (China Academy of information and communications technology) · Zhenzhen Jiao (Beijing Teleinfo Technology, CAICT)
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
Yuxin Chen (None) · Zongyang Ma (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Ziqi Zhang (None) · Zhongang Qi (Tencent PCG ARC Lab) · Chunfeng Yuan (, Institute of automation, Chinese academy of science) · Bing Li (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Junfu Pu (Tencent ARC Lab) · Ying Shan (Tencent) · Xiaojuan Qi (University of Oxford) · Weiming Hu (Institute of automation, Chinese academy of science)
Locally Adaptive Neural 3D Morphable Models
Michail Tarasiou (Imperial College London) · Rolandos Alexandros Potamias (Imperial College London) · Eimear O' Sullivan (Huawei Technologies Ltd.) · Stylianos Ploumpis (Imperial College London) · Stefanos Zafeiriou (Imperial College London)
Revisiting Adversarial Training at Scale
Zeyu Wang (University of California, Santa Cruz) · Xianhang li (University of California, Santa Cruz) · Hongru Zhu (None) · Cihang Xie (University of California, Santa Cruz)
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
Zijin Yin (None) · Kongming Liang (Beijing University of Posts and Telecommunications) · Bing Li (None) · Zhanyu Ma (Beijing University of Post and Telecommunication) · Jun Guo (Beijing University of Posts and Telecommunications)
MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning
Mohamed Abdelfattah (VITA, EPFL) · Mariam Hassan (EPFL - EPF Lausanne) · Alex Alahi (None)
Logit Standardization in Knowledge Distillation
Shangquan Sun (University of Chinese Academy of Sciences) · Wenqi Ren (Sun Yat-Sen University) · Jingzhi Li (Institute information of engineering, chinese academy of sciences) · Rui Wang (Institute of Information Engineering) · Xiaochun Cao (SUN YAT-SEN UNIVERSITY)
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Zicong Fan (ETH Zürich) · Maria Parelli (ETH Zurich) · Maria Kadoglou (ETHZ - ETH Zurich) · Xu Chen (Google) · Muhammed Kocabas (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Michael J. Black (University of Tübingen) · Otmar Hilliges (None)
Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity
Yuhang Chen (Wuhan University) · Wenke Huang (Wuhan University) · Mang Ye (Wuhan University)
Visual In-Context Prompting
Feng Li (The Hong Kong University of Science and Technology) · Qing Jiang (South China University of Technology) · Hao Zhang (The Hong Kong University of Science and Technology) · Shilong Liu (Tsinghua University, Tsinghua University) · Huaizhe Xu (Hong Kong University of Science and Technology) · Xueyan Zou (None) · Tianhe Ren (The International Digital Economy Academy) · Hongyang Li (South China University of Technology) · Lei Zhang (International Digital Economy Academy (IDEA)) · Chunyuan Li (Microsoft Research, Redmond) · Jianwei Yang (Microsoft Research) · Jianfeng Gao (Microsoft Research)
Overload: Latency Attacks on Object Detection for Edge Devices
Erh-Chung Chen (National Tsing Hua University) · Pin-Yu Chen (None) · I-Hsin Chung (IBM Research) · Che-Rung Lee (Department of Computer Science, National Tsing Hua University, National Tsing Hua University)
Dual DETRs for Multi-Label Temporal Action Detection
Yuhan Zhu () · Guozhen Zhang (Nanjing University) · Jing Tan (The Chinese University of Hong Kong) · Gangshan Wu (Nanjing University) · Limin Wang (Nanjing University)
UFC-Net: Unrolling Fixed-point Continuous Network for Deep Compressive Sensing
Xiaoyang Wang (Northwest Polytechnical University Xi'an) · Hongping Gan (Northwest Polytechnical University Xi'an)
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Haoyi Jiang (Huazhong University of Science and Technology) · Tianheng Cheng (Huazhong University of Science and Technology) · Naiyu Gao (HorizonRobotics Inc.) · Haoyang Zhang (Horizon Robotics) · Tianwei Lin (Horizon Robotics) · Wenyu Liu (Huazhong University of Science and Technology) · Xinggang Wang (Huazhong University of Science and Technology)
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu (KAUST) · Chenlin Zhang (Moonshot AI, Ltd) · Chen Zhao (King Abdullah University of Science and Technology (KAUST)) · Bernard Ghanem (KAUST)
Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection
Yajing Liu (Shenyang Institute of Automation, Chinese Academy of Sciences) · Shijun Zhou (Shenyang Institute of Automation, Chinese Academy of Sciences) · Xiyao Liu (Shenyang Institute of Automation Chinese Academy of Sciences ) · chunhui Hao (None) · Baojie Fan (Nanjing University of Posts and Telecommunications) · Jiandong Tian (The Shenyang Institute of Automation, Chinese Academy of Sciences)
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Duojun Huang (SUN YAT-SEN UNIVERSITY) · Xinyu Xiong (SUN YAT-SEN UNIVERSITY) · Jie Ma (SUN YAT-SEN UNIVERSITY) · Jichang Li (The University of Hong Kong) · Zequn Jie (Meituan) · Lin Ma (Meituan) · Guanbin Li (Sun Yat-sen University)
Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning
Youqi Pan (Peking University) · Wugen Zhou (Peking University) · Yingdian Cao (Peking University) · Hongbin Zha (Peking University)
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
Kangning Yin (ShanghaiTech University) · Shihao Zou (University of Alberta) · Yuxuan Ge (ShanghaiTech University) · Zheng Tian (ShanghaiTech University)
Continual Segmentation with Disentangled Objectness Learning and Class Recognition
Yizheng Gong (Xi'an Jiaotong-Liverpool University) · Siyue Yu (Xi'an Jiaotong-Liverpool University) · Xiaoyang Wang () · Jimin Xiao (Xi'an Jiaotong-Liverpool University)
Supervised Anomaly Detection for Complex Industrial Images
Aimira Baitieva (Valeo) · David Hurych (Valeo.ai) · Victor Besnier (Valeo.ai) · Olivier BERNARD (Valeo)
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing
Denis Bobkov (Higher School of Economics, Higher School of Economics) · Vadim Titov (ARTIFICIAL INTELLIGENCE RESEARCH INSTITUTE) · Aibek Alanov (Artificial Intelligence Research Institute) · Dmitry Vetrov (Constructor University)
Interactive Continual Learning: Fast and Slow Thinking
Biqing Qi (Harbin Institute of Technology & Tsinghua University & Frontis.AI) · Xinquan Chen (Harbin Institute of Technology) · Junqi Gao (Harbin Institute of Technology) · Dong Li (Harbin Institute of Technology) · Jianxing Liu (Harbin Institute of Technology) · Ligang Wu (Harbin Institute of Technology) · Bowen Zhou (Tsinghua University)
Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection
Chen Chen (National University of Defense Technology) · Jiahao Qi (None) · Xingyue Liu (National University of Defense Technology) · Kangcheng Bin (National University of Defense Technology) · Ruigang Fu (None) · Xikun Hu (None) · Ping Zhong (National University of Defense Technology)
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
Lucas Nunes (University of Bonn) · Rodrigo Marcuzzi (University of Bonn) · Benedikt Mersch (University of Bonn) · Jens Behley (University of Bonn) · Cyrill Stachniss (University of Bonn)
Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization
Deng Li (Tianjin University) · Aming Wu (Xidian University) · Yaowei Wang (Pengcheng Laboratory) · Yahong Han (Tianjin University)
In-Context Matting
He Guo () · Zixuan Ye (None) · Zhiguo Cao () · Hao Lu (Huazhong University of Science and Technology)
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang (Shanghai AI Laboratory) · Kunchang Li (SIAT, UCAS) · Xinyuan Chen (Shanghai Artificial Intelligence Laboratory) · Yaohui Wang (Shanghai AI Laboratory) · Ziwei Liu (Nanyang Technological University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Yali Wang (SIAT, Chinese Academy of Sciences)
EscherNet: A Generative Model for Scalable View Synthesis
Xin Kong (Imperial College London) · Shikun Liu (Imperial College London) · Xiaoyang Lyu (University of Hong Kong) · Marwan Taher (The University of Sheffield) · Xiaojuan Qi (University of Oxford) · Andrew J. Davison (Imperial College London)
FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking
Seokju Cho (Korea University) · Gabriel Huang (None) · Seungryong Kim (Korea University) · Joon-Young Lee (Adobe Research)
MVCPS-NeuS: Multi-view Constrained Photometric Stereo for Neural Surface Reconstruction
Hiroaki Santo (Osaka University) · Fumio Okura (Osaka University) · Yasuyuki Matsushita (Osaka University)
LLaFS: When Large Language Models Meet Few-Shot Segmentation
Lanyun Zhu (Singapore University of Technology and Design) · Tianrun Chen (Zhejiang University) · Deyi Ji (None) · Jieping Ye (Alibaba Group) · Jun Liu (Singapore University of Technology and Design (SUTD))
Towards Memorization-Free Diffusion Models
Chen Chen (University of Sydney) · Daochang Liu (University of Sydney) · Chang Xu (University of Sydney)
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
Ming Yan (Xiamen University) · Yan Zhang (Xiamen University) · Shuqiang Cai (Xiamen University) · Shuqi Fan (Xiamen University) · Xincheng Lin (Xiamen University) · Yudi Dai (Xiamen University) · Siqi Shen (Xiamen University) · Chenglu Wen (Xiamen University) · Lan Xu (ShanghaiTech University) · Yuexin Ma (ShanghaiTech University) · Cheng Wang (Xiamen University)
Guided Slot Attention for Unsupervised Video Object Segmentation
Minhyeok Lee (Yonsei University) · Suhwan Cho (Yonsei University) · Dogyoon Lee (Yonsei University) · Chaewon Park (Yonsei University) · Jungho Lee () · Sangyoun Lee (Yonsei University)
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
Zhiyuan Min (Zhejiang University) · Yawei Luo (Zhejiang University) · Wei Yang (Huazhong University of Science and Technology) · Yuesong Wang (None) · Yi Yang (Zhejiang University)
Unified Entropy Optimization for Open-Set Test-Time Adaptation
Zhengqing Gao (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xu-Yao Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Cheng-Lin Liu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
SEED-Bench: Benchmarking Multimodal Large Language Models
Bohao Li (None) · Yuying Ge (University of Hong Kong) · Yixiao Ge (Tencent) · Guangzhi Wang (National University of Singapore) · Rui Wang (Fudan University) · Ruimao Zhang (The Chinese University of Hong Kong (Shenzhen)) · Ying Shan (Tencent)
LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
Tuo Feng (University of Technology Sydney) · Wenguan Wang (Zhejiang University) · Fan Ma (None) · Yi Yang (Zhejiang University)
Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model
Zelin Peng (None) · Zhengqin Xu (Shanghai Jiaotong University) · Zhilin Zeng (Shanghai Jiaotong University) · Lingxi Xie (Huawei Technologies Ltd.) · Qi Tian (Huawei Technologies Ltd.) · Wei Shen (None)
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
Xiaolu Liu (Zhejiang University) · Song Wang (Zhejiang University) · Wentong Li (College of Computer Science and Technology, Zhejiang University) · Ruizi Yang (Zhejiang University) · Junbo Chen (UDEER AI PTE.LTD) · Jianke Zhu (Zhejiang University)
ViT-Lens: Towards Omni-modal Representations
Stan Weixian Lei (National University of Singapore) · Yixiao Ge (Tencent) · Kun Yi (Tencent ARC Lab) · Jianfeng Zhang (NUS) · Difei Gao (None) · Dylan Sun (University of Southern California) · Yuying Ge (University of Hong Kong) · Ying Shan (Tencent) · Mike Zheng Shou (National University of Singapore)
Leak and Learn: An Attacker's Cookbook to Train Using Leaked Data from Federated Learning
Joshua C. Zhao (Purdue University) · Ahaan Dabholkar (Purdue University) · Atul Sharma (Purdue University) · Saurabh Bagchi (KeyByte LLC)
Rewrite the stars
Xu Ma (Northeastern University) · Xiyang Dai (Microsoft) · Yue Bai (Northeastern University) · Yizhou Wang (Northeastern University) · Yun Fu (Northeastern University)
MultiPhys: Multi-Person Physics-aware 3D Motion Estimation
Nicolás Ugrinovic (Universitat Politècnica de Catalunya) · Boxiao Pan (Stanford University) · Georgios Pavlakos (University of Texas at Austin) · Despoina Paschalidou (Stanford) · Bokui Shen (Stanford University) · Jordi Sanchez-Riera (IRI-CSIC - Institut de Robòtica i Informàtica Industrial) · Francesc Moreno-Noguer (Universidad Politécnica de Cataluna) · Leonidas Guibas (Stanford University)
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao (The Chinese University of Hong Kong) · Yuxuan Hu (The Chinese University of Hong Kong) · Letian Wang (University of Toronto) · Guanglu Song (Sensetime X-Lab) · Steven L. Waslander (University of Toronto) · Yu Liu (The Chinese University of Hong Kong) · Hongsheng Li (The Chinese University of Hong Kong)
A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detection
Hanshi Wang (Institute of automation, Chinese academy of science) · Zhipeng Zhang (Didi Research) · Jin Gao (Institute of automation, Chinese Academy of Sciences) · Weiming Hu (Institute of automation, Chinese academy of science)
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao (Microsoft) · Haiping Wu (Microsoft) · Weijian Xu (Microsoft) · Xiyang Dai (Microsoft) · Houdong Hu (Microsoft) · Yumao Lu (Microsoft) · Michael Zeng (Microsoft) · Ce Liu (Microsoft) · Lu Yuan (Microsoft)
Adversarial Score Distillation: When score distillation meets GAN
Min Wei (Beijing University of Posts and Telecommunications) · Jingkai Zhou (Alibaba DAMO Academy) · Junyao Sun (South China University of Technology) · Xuesong Zhang (Beijing University of Posts and Telecommunications)
BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection
Zhenxin Li (Fudan University) · Shiyi Lan (NVIDIA CORPORATION) · Jose M. Alvarez (NVIDIA) · Zuxuan Wu (Fudan University)
HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses
Caoyuan Ma (Wuhan University) · Yu-Lun Liu (National Yang Ming Chiao Tung University) · Zhixiang Wang (The University of Tokyo) · Wu Liu (None) · Xinchen Liu (None) · Zheng Wang (Wuhan University)
DMR: Decomposed Multi-Modality Representations for Frames and Events Fusion in Visual Reinforcement Learning
Haoran Xu () · Peixi Peng (Peking University) · Guang Tan (Sun Yat-sen University) · Yuan Li (Academy of Military Sciences) · Xinhai Xu (Academy of Military Sciences) · Yonghong Tian (Peking University)
Communication-Efficient Collaborative Perception via Information Filling with Codebook
Yue Hu (Shanghai Jiao Tong University) · Juntong Peng (Shanghai Jiao Tong University) · Sifei Liu (Shanghai Jiao Tong University) · Junhao Ge (Shanghai Jiaotong University) · Si Liu (Beihang University) · Siheng Chen (Shanghai Jiao Tong University)
EventDance: Unsupervised Cross-modal Source-free Adaptation for Event-based Object Recognition
Xu Zheng (HKUST) · Lin Wang (Hong Kong University of Science and Technology)
Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion
Xunpeng Yi (None) · Han Xu (None) · Hao Zhang (Wuhan University) · Linfeng Tang (Wuhan University) · Jiayi Ma (Wuhan University)
Semantics-aware Motion Retargeting with Vision-Language Models
Haodong Zhang (None) · ZhiKe Chen () · Haocheng Xu (Zhejiang University) · Lei Hao (Huawei Noah's Ark Lab) · Xiaofei Wu (Huawei Technologies Ltd.) · Songcen Xu (Huawei Noah's Ark Lab) · Zhensong Zhang (Huawei Noah's Ark Lab) · Yue Wang (Zhejiang University) · Rong Xiong (Zhejiang University)
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
Chenjie Cao (None) · Yunuo Cai (Fudan University) · Qiaole Dong (Fudan University) · Yikai Wang (None) · Yanwei Fu (Fudan University)
MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints
Pengfei Xie (Southeast University) · Wenqiang Xu (Shanghai Jiao Tong University) · Tutian Tang (Shanghai Jiao Tong University) · Zhenjun Yu (Shanghai JiaoTong University) · Cewu Lu (Shanghai Jiao Tong University)
PanoPose: Self-supervised Relative Pose Estimation for Panoramic Images
Diantao Tu (None) · Hainan Cui (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xianwei Zheng (Wuhan University) · Shuhan Shen (Institute of automation, Chinese academy of science)
Enhancing Post-training Quantization Calibration through Contrastive Learning
Yuzhang Shang (Illinois Institute of Technology) · Gaowen Liu (None) · Ramana Kompella (Cisco) · Yan Yan (Illinois Institute of Technology)
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
Jiahe Li (Beijing University of Aeronautics and Astronautics) · Jiawei Zhang (Beijing University of Aeronautics and Astronautics) · Xiao Bai (Beijing University of Aeronautics and Astronautics) · Jin Zheng (Beijing University of Aeronautics and Astronautics) · Xin Ning (Institute of Semiconductors, Chinese Academy of Sciences) · Jun Zhou (Griffith University) · Lin Gu (RIKEN / the University of Tokyo)
DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans
Akash Sengupta (University of Cambridge) · Thiemo Alldieck (Google) · NIKOS KOLOTOUROS (None) · Enric Corona (Google) · Andrei Zanfir (Google) · Cristian Sminchisescu (Lund University)
Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problem
Haoquan Zhang (South China University of Technology) · Ronggang Huang (South China University of Technology) · Yi Xie (South China University of Technology) · Huaidong Zhang (South China University of Technology)
Global and Local Prompts Cooperation via Optimal Transport for Federated Learning
Hongxia Li (ShanghaiTech University) · Wei Huang (RIKEN AIP) · Jingya Wang (ShanghaiTech University) · Ye Shi (ShanghaiTech University)
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
Jiequan Cui (Nanyang Technological University) · Beier Zhu (Nanyang Technological University) · Xin Wen (The University of Hong Kong) · Xiaojuan Qi (University of Oxford) · Bei Yu (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Hanwang Zhang (Nanyang Technological University)
Dense Optical Tracking: Connecting the Dots
Guillaume Le Moing (Inria) · Jean Ponce (Ecole Normale Supérieure de Paris) · Cordelia Schmid (Inria / Google)
Multi-agent Collaborative Perception via Motion-aware Robust Communication Network
Shixin Hong (Tsinghua University, Tsinghua University) · Yu LIU (Tsinghua University, Tsinghua University) · Zhi Li (Shenzhen International Graduate School, Tsinghua University) · Shaohui Li ( Tsinghua University) · You He (Tsinghua University, Tsinghua University)
Precise Image Editing via Recognition and Generation Tasks
Shelly Sheynin (META) · Adam Polyak (Meta) · Uriel Singer (Meta) · Yuval Kirstain (Tel Aviv University) · Amit Zohar (Meta) · Oron Ashual (Meta) · Devi Parikh (Meta / Georgia Tech) · Yaniv Taigman (Facebook)
Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring
Xiaoqian Lv (Harbin Institute of Technology) · Shengping Zhang (Harbin Institute of Technology) · Chenyang Wang (Harbin Institute of Technology) · Yichen Zheng (Huazhong University of Science and Technology) · Bineng Zhong (Guangxi Normal University) · Chongyi Li () · Liqiang Nie (Harbin Institute of Technology (Shenzhen))
Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training
Qian Li (Dalian University of Technology) · Yuxiao Hu (None) · Yinpeng Dong (Tsinghua University) · Dongxiao Zhang (Eastern Institute for Advanced Study) · Yuntian Chen (Eastern Institute for Advanced Study)
ColorPCR: Color Point Cloud Registration with Multi-Stage Geometric-Color Fusion
Juncheng Mu (, Tsinghua University) · Lin Bie (Tsinghua University, Tsinghua University) · Shaoyi Du (Xi'an Jiaotong University) · Yue Gao (Tsinghua University, Tsinghua University)
Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi
Kangwei Yan (Xi'an Jiaotong University) · Fei Wang (Xi'an Jiaotong University) · Bo Qian (None) · Han Ding (Xi'an Jiaotong University) · Jinsong Han (Zhejiang University) · Xing Wei (None)
MemFlow: Optical Flow Estimation and Prediction with Memory
Qiaole Dong (Fudan University) · Yanwei Fu (Fudan University)
FREE: Faster and Better Data-Free Meta-Learning
Yongxian Wei (Tsinghua University) · Zixuan Hu (Tsinghua University) · Zhenyi Wang (University of Maryland, College Park) · Li Shen (JD Explore Academy) · Chun Yuan (Tsinghua University, Tsinghua University) · Dacheng Tao (None)
Open Vocabulary Semantic Scene Sketch Understanding
Ahmed Bourouis (CVSSP, PAI, University of Surrey) · Judith Fan (Stanford University) · Yulia Gryaditskaya (CVSSP, PAI, University of Surrey)
Unsupervised Feature Learning with Emergent Data-Driven Prototypicality
Yunhui Guo (The University of Texas at Dallas) · Youren Zhang (University of Michigan - Ann Arbor) · Yubei Chen (New York University) · Stella X. Yu (None)
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
Haoyu Chen (Hong Kong University of Science and Technology (Guangzhou)) · Wenbo Li (Huawei Technologies Ltd.) · Jinjin Gu (University of Sydney) · Jingjing Ren (The Hong Kong University of Science and Technology (Guangzhou)) · Haoze Sun (Tsinghua University, Tsinghua University) · Xueyi Zou (Huawei Technologies Ltd.) · Youliang Yan (Huawei Technologies Ltd.) · Zhensong Zhang (Huawei Noah's Ark Lab) · Lei Zhu (Hong Kong University of Science and Technology (Guangzhou) & HKUST)
Distilling ODE Solvers of Diffusion Models into Smaller Steps
Sanghwan Kim (ETHZ - ETH Zurich) · Hao Tang (ETH Zurich and CMU) · Fisher Yu (ETH Zurich)
3DiffTection: 3D Object Detection with Geometry-aware Diffusion Features
Chenfeng Xu (University of California Berkeley) · Huan Ling (Nvidia, University of Toronto) · Sanja Fidler (Department of Computer Science, University of Toronto) · Or Litany (NVIDIA / Technion)
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov (KAUST) · Willi Menapace (University of Trento) · Aliaksandr Siarohin (Snap Inc.) · Sergey Tulyakov (Snap Inc.)
XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies
Xuanchi Ren (University of Toronto) · Jiahui Huang (None) · Xiaohui Zeng (Department of Computer Science, University of Toronto) · Ken Museth (NVIDIA) · Sanja Fidler (Department of Computer Science, University of Toronto) · Francis Williams (NVIDIA)
Probabilistic Human Mesh Estimation with Hypothesis Scoring
Yuan Xu (Peking University) · Xiaoxuan Ma (Peking University) · Jiajun Su (None) · Wentao Zhu (None) · Yu Qiao (Shanghai Jiao Tong University) · Yizhou Wang (Peking University)
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Shitian Zhao (East China Normal University) · Zhuowan Li (Johns Hopkins University) · YadongLu (ECNU) · Alan L. Yuille (Johns Hopkins University) · Yan Wang (East China Normal University)
GRAM: Global Reasoning for Multi-Page VQA
Itshak Blau (Electrical Engineering Department, Technion – Israel Institute of Technology, Technion - Israel Institute of Technology) · Sharon Fogel (Amazon) · Roi Ronen (Technion - Israel Institute of Technology, Technion - Israel Institute of Technology) · Alona Golts (Amazon) · Shahar Tsiper (Amazon) · Elad Ben Avraham (Amazon) · Aviad Aberdam (Amazon AWS AI) · Roy Ganz (Technion - Israel Institute of Technology, Technion) · Ron Litman (Amazon AI Labs)
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
wenlong deng (University of British Columbia) · Christos Thrampoulidis (None) · Xiaoxiao Li (University of British Columbia)
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu (None) · Juncheng Li (Zhejiang University) · Longhui Wei (Huawei Cloud Technologies Ltd.) · Liang Pang (Institute of Computing Technology, Chinese Academy of Sciences) · Wentao Ye (Zhejiang University) · Bosheng Qin (Zhejiang University) · Siliang Tang (Zhejiang University) · Qi Tian (Huawei Technologies Ltd.) · Yueting Zhuang (Zhejiang University)
VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources
Fan Fei (Peking University) · Jiajun Tang (Peking University) · Ping Tan (Hong Kong University of Science and Technology) · Boxin Shi (Peking University)
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?
Maxime Zanella (Université Catholique de Louvain) · Ismail Ben Ayed (ETS Montreal)
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Pengchong Qiao (Peking University) · Lei Shang (Alibaba Group) · Chang Liu (Tsinghua University, Tsinghua University) · Baigui Sun (Alibaba Group) · Xiangyang Ji (Tsinghua University) · Jie Chen (Peking University)
SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks
Xinyu Shi (None) · Zecheng Hao (Peking University) · Zhaofei Yu (Peking University)
SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction
Pin Tang (Shanghai Jiao Tong University) · Zhongdao Wang (Huawei Technologies Ltd.) · Guoqing Wang (Shanghai Jiao Tong University) · Jilai Zheng (Shanghai Jiaotong University) · Xiangxuan Ren (Shanghai Jiao Tong University) · Bailan Feng (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Chao Ma (Shanghai Jiao Tong University)
Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization
Ye Chen (Shanghai Jiao Tong University) · Bingbing Ni (Shanghai Jiao Tong University) · Jinfan Liu (Tongji University) · Xiaoyang Huang (Shanghai Jiao Tong University) · Xuanhong Chen (Shanghai Jiao Tong University)
OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees
Hakyeong Kim (Korea Advanced Institute of Science and Technology) · Andreas Meuleman (Korea Advanced Institute of Science and Technology) · Hyeonjoong Jang (None) · James Tompkin (Brown University) · Min H. Kim (KAIST)
Extreme Point Supervised Instance Segmentation
Hyeonjun Lee (POSTECH) · Sehyun Hwang (POSTECH) · Suha Kwak (POSTECH)
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Xu He (Tsinghua University) · Qiaochu Huang (Tsinghua University, Tsinghua University) · Zhensong Zhang (Huawei Noah's Ark Lab) · Zhiwei Lin (Tsinghua Shenzhen International Graduate School) · Zhiyong Wu (Tsinghua University) · Sicheng Yang () · Minglei Li (Huawei Cloud Computing Technologies Ltd.) · Zhiyi Chen (Huawei Technologies Ltd.) · Songcen Xu (Huawei Noah's Ark Lab) · Xiaofei Wu (Huawei Technologies Ltd.)
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang (The University of Hong Kong) · Yukun Huang (University of Science and Technology of China) · Xiaoyang Wu (The University of Hong Kong) · Yuan-Chen Guo (Tsinghua University) · Song-Hai Zhang (Tsinghua University, Tsinghua University) · Hengshuang Zhao (The University of Hong Kong) · Tong He (Shanghai AI Lab) · Xihui Liu (The University of Hong Kong)
Degree-of-Freedom Matters: Inferring Dynamics from Point Trajectories
Yan Zhang (ETH Zurich) · Sergey Prokudin (ETHZ - ETH Zurich) · Marko Mihajlovic (Swiss Federal Institute of Technology) · Qianli Ma (NVIDIA Research) · Siyu Tang (ETH Zurich)
ActiveDC: Distribution Calibration for Active Finetuning
Wenshuai Xu () · Zhenghui Hu (Hangzhou Innovation Institute, Beihang University) · Yu Lu (Beijing University of Aeronautics and Astronautics) · Jinzhou Meng (Beijing University of Aeronautics and Astronautics) · Qingjie Liu (None) · Yunhong Wang (Beihang University)
KVQ: Kwai Video Quality Assessment for Short-form Videos
Yiting Lu (University of Science and Technology of China) · Xin Li (None) · Yajing Pei (None) · Kun Yuan (Kuaishou Technology) · Qizhi Xie (None) · Yunpeng Qu (None) · Ming Sun (Kuaishou Tech) · Chao Zhou (Peking University) · Zhibo Chen (University of Science and Technology of China)
Bidirectional Autoregessive Diffusion Model for Dance Generation
Canyu Zhang (University of South Carolina) · Youbao Tang (PAII INC.) · NING Zhang (PAII Inc.) · Ruei-Sung Lin (PAII Inc) · Mei Han (PAII Inc.) · Jing Xiao (Pingan Group) · Song Wang (University of South Carolina)
CoSeR: Bridging Image and Language for Cognitive Super-Resolution
Haoze Sun (Tsinghua University, Tsinghua University) · Wenbo Li (Huawei Technologies Ltd.) · Jianzhuang Liu (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Haoyu Chen (Hong Kong University of Science and Technology (Guangzhou)) · Renjing Pei (Huawei Technologies Ltd.) · Xueyi Zou (Huawei Technologies Ltd.) · Youliang Yan (Huawei Technologies Ltd.) · Yujiu Yang (Tsinghua University)
You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
Subhadeep Koley (University of Surrey) · Ayan Kumar Bhunia (University of Surrey, United Kingdom) · Aneeshan Sain (University of Surrey) · Pinaki Nath Chowdhury (University of Surrey) · Tao Xiang (University of Surrey) · Yi-Zhe Song (None)
NeRF Analogies - Example-Based Visual Attribute Transfer for NeRFs
Michael Fischer (University College London) · Zhengqin Li (Facebook) · Thu Nguyen-Phuoc (Reality Labs Research, Meta) · Aljaž Božič (Facebook) · Zhao Dong (Meta RL Research) · Carl Marshall (Reality Labs Research) · Tobias Ritschel (University College London)
InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning
Yan-Shuo Liang (nanjing university) · Wu-Jun Li (Nanjing University)
Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
Zhiwen Yan (National University of Singapore) · Weng Fei Low () · Yu Chen (National University of Singapore) · Gim Hee Lee (National University of Singapore)
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
Jiwoo Chung (Sungkyunkwan University) · Sangeek Hyun (Sungkyunkwan University) · Jae-Pil Heo (Sungkyunkwan University)
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model
Dian Zheng (None) · Xiao-Ming Wu (SUN YAT-SEN UNIVERSITY) · Shuzhou Yang (Peking University) · Jian Zhang (Peking University) · Jian-Fang Hu (SUN YAT-SEN UNIVERSITY) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY)
Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
Xinshun Wang (Sun Yat-sen University) · Zhongbin Fang (Sun Yat-sen University) · Xia Li (Department of Computer Science, ETH Zurich) · Xiangtai Li (Nanyang Technological University) · Chen Chen () · Mengyuan Liu (SUN YAT-SEN UNIVERSITY)
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun Reddy (Johns Hopkins University) · William Paul (None) · Corban Rivera (Johns Hopkins University Applied Physics Laboratory) · Ketul Shah (Johns Hopkins University) · Celso M. de Melo (University of Southern California) · Rama Chellappa (Johns Hopkins University)
Fast Adaptation for Human Pose Estimation via Meta-Optimization
Shengxiang Hu (None) · Huaijiang Sun (Nanjing University of Science and Technology) · Bin Li (Nanjing University of Science and Technology) · Dong Wei (Nanjing University of Science and Technology) · Weiqing Li (Nanjing University of Science and Technology) · Jianfeng Lu (Nanjing University of Science and Technology)
"Previously on ..." From Recaps to Story Summarization
Aditya Kumar Singh (International Institute of Information Technology, Hyderabad) · Dhruv Srivastava (International Institute of Information Technology (IIIT-H), Hyderabad) · Makarand Tapaswi (IIIT Hyderabad, Wadhwani AI)
Generating Non-Stationary Textures using Self-Rectification
Yang Zhou (Shenzhen University) · Rongjun Xiao (None) · Dani Lischinski (The Hebrew University of Jerusalem, Israel) · Daniel Cohen-Or (Google) · Hui Huang (Shenzhen University)
SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection
JUNSU KIM (Ulsan National Institute of Science and Technology) · Hoseong Cho (None) · Jihyeon Kim (Ulsan National Institute of Science and Technology) · Yihalem Tiruneh (Ulsan National Institute of Science and Technology) · Seungryul Baek (UNIST)
Frozen Feature Augmentation for Few-Shot Image Classification
Andreas Bär (None) · Neil Houlsby (Google) · Mostafa Dehghani (Google DeepMind) · Manoj Kumar (Google Deepmind)
1-Lipschitz Layers Compared: Memory, Speed, and Certifiable Robustness
Bernd Prach (ISTA) · Fabio Brau (Scuola Superiore Sant'Anna Pisa) · Giorgio Buttazzo (Scuola Superiore Sant'Anna Pisa) · Christoph Lampert (Institute of Science and Technology Austria)
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Hyeonho Jeong (Korea Advanced Institute of Science & Technology) · Geon Yeong Park (Korea Advanced Institute of Science and Technology) · Jong Chul Ye (Korea Advanced Institute of Science and Technology)
Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection
Jiawen Zhu (Singapore Management University) · Choubo Ding (University of Adelaide) · Yu Tian (None) · Guansong Pang (Singapore Management University)
L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream
Jingtao Sun (National University of Singapore) · Yaonan Wang (Hunan University) · Mingtao Feng (Xidian University) · Yulan Guo (SUN YAT-SEN UNIVERSITY) · Ajmal Mian (University of Western Australia) · Mike Zheng Shou (National University of Singapore)
BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
Qihang Zhang (The Chinese University of Hong Kong) · Yinghao Xu (Chinese University of Hong Kong) · Yujun Shen (The Chinese University of Hong Kong) · Bo Dai (Shanghai AI Laboratory) · Bolei Zhou (University of California, Los Angeles) · Ceyuan Yang (The Chinese University of Hong Kong)
GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
Yingwenqi Jiang (None) · Jiadong Tu (None) · Yuan Liu (The University of Hong Kong) · Xifeng Gao (Tencent America) · Xiaoxiao Long (The University of Hong Kong) · Wenping Wang (Texas A&M University - College Station) · Yuexin Ma (ShanghaiTech University)
Scaling Laws for Data Filtering: Data Curation cannot be Compute Agnostic
Sachin Goyal (Carnegie Mellon University) · Pratyush Maini (Carnegie Mellon University) · Zachary Lipton (Carnegie Mellon University) · Aditi Raghunathan (Carnegie Mellon University) · Zico Kolter (Carnegie Mellon University)
PoNQ: a Neural QEM-based Mesh Representation
Nissim Maruani (Inria) · Maks Ovsjanikov (Ecole Polytechnique, France) · Pierre Alliez (INRIA) · Mathieu Desbrun (INRIA)
Representing Signs as Language: A New Method for Sign Language Translation from Videos
Jia Gong (Singapore University of Technology and Design) · Lin Geng Foo (Singapore University of Technology and Design) · Yixuan He (Singapore University of Technology and Design) · Hossein Rahmani (Lancaster University) · Jun Liu (Singapore University of Technology and Design (SUTD))
HIPTrack: Visual Tracking with Historical Prompts
Wenrui Cai (Beihang University) · Qingjie Liu (None) · Yunhong Wang (Beihang University)
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
Shunsuke Yasuki (Rikkyo University) · Masato Taki (Rikkyo University)
Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
Siteng Huang (Zhejiang University & Westlake University) · Biao Gong (Alibaba Group) · Yutong Feng (Alibaba Group) · Xi Chen (the University of Hong Kong, University of Hong Kong) · Yuqian Fu (Fudan University) · Yu Liu (Alibaba Group) · Donglin Wang (Westlake University)
Regularized Parameter Uncertainty for Improving Generalization in Reinforcement Learning
Pehuen Moure (ETH Zurich) · Longbiao Cheng (Insititute of Neuroinformatics, University of Zurich and ETH Zurich) · Joachim Ott (Swiss Federal Institute of Technology) · Zuowen Wang (Institute of Neuroinformatics, University of Zurich and ETH Zurich) · Shih-Chii Liu (University of Zurich and ETH Zurich)
Robust Noisy Correspondence Learning with Equivariant Similarity Consistency
Yuchen Yang (Xi'an University of Electronic Science and Technology) · Erkun Yang (None) · Likai Wang (Xi'an University of Electronic Science and Technology) · Cheng Deng (Xidian University)
PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video
Dong Wu (None) · Zike Yan (Tsinghua University) · Hongbin Zha (Peking University)
Boosting Flow-based Generative Super-Resolution Models via Learned Prior
Li-Yuan Tsao (National Tsing Hua University) · Yi-Chen Lo (National Tsing Hua University) · Chia-Che Chang (MediaTek) · Hao-Wei Chen (National Tsing Hua University) · Roy Tseng (MediaTek) · Chien Feng (Department of Computer Science, National Tsing Hua University, National Tsinghua University) · Chun-Yi Lee (National Tsing Hua University)
Situational Awareness Matters in 3D Vision Language Reasoning
Yunze Man (Department of Computer Science, University of Illinois at Urbana-Champaign) · Liang-Yan Gui (UIUC) · Yu-Xiong Wang (None)
Directed Decentralized Collaboration for Personalized Federated Learning
Yingqi Liu (None) · Yifan Shi (Tsinghua University, Tsinghua University) · Qinglun Li (National University of Defense Technology) · Baoyuan Wu (The Chinese University of Hong Kong, Shenzhen) · Xueqian Wang (Tsinghua University, Tsinghua University) · Li Shen (JD Explore Academy)
Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image
Yiqun Mei (None) · Yu Zeng (None) · He Zhang (Adobe Systems) · Zhixin Shu (Adobe Systems) · Xuaner Zhang (Adobe) · Sai Bi (Adobe Systems) · Jianming Zhang (Adobe Systems) · HyunJoon Jung (Adobe Systems) · Vishal M. Patel (Johns Hopkins University)
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo (Fudan University) · Zhineng Chen (Fudan University) · Peng Zhou (Amazon) · Zuxuan Wu (Fudan University) · Xieping Gao (None) · Yu-Gang Jiang (Fudan University)
Task-Driven Wavelets using Constrained Empirical Risk Minimization
Eric Marcus (Netherlands Cancer Institute) · Ray Sheombarsing (None) · Jan-Jakob Sonke (Netherlands Cancer Institute) · Jonas Teuwen (Netherlands Cancer Institute)
Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision
Xin Juan (None) · Kaixiong Zhou (Rice University) · Ninghao Liu (University of Georgia) · Tianlong Chen (Massachusetts Institute of Technology) · Xin Wang (Jilin University)
AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval
Sixing Yan (None) · William K. Cheung (Hong Kong Baptist University) · Ivor Tsang (A*STAR) · Wan Hang Keith Chiu (Queen Elizabeth Hospital) · Tong Terence (The Chinese University of Hong Kong) · Ka Chun Cheung (NVIDIA) · Simon See (NVIDIA)
Text-to-3D using Gaussian Splatting
Zilong Chen (Tsinghua University) · Feng Wang (Tsinghua University, Tsinghua University) · Yikai Wang (Tsinghua University) · Huaping Liu (Tsinghua University, Tsinghua University)
Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion
Naishan Zheng (University of Science and Technology of China) · Man Zhou (University of Science and Technology of China) · Jie Huang (University of Science and Technology of China) · Junming Hou (Southeast University) · Haoying Li (Zhejiang University) · Yuan Xu (Nanyang Technological University) · Feng Zhao (University of Science and Technology of China)
InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
Jihyun Lee (KAIST) · Shunsuke Saito (Reality Labs Research) · Giljoo Nam (Meta) · Minhyuk Sung (KAIST) · Tae-Kyun Kim (Imperial College London)
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan (Massachusetts Institute of Technology) · Kaifeng Chen (Google) · Dilip Krishnan (Google) · Dina Katabi (Massachusetts Institute of Technology) · Phillip Isola (None) · Yonglong Tian (Google)
Egocentric Full Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement
Jian Wang (Max Planck Institute for Informatics) · Zhe Cao (None) · Diogo Luvizon (Saarland Informatics Campus, Max-Planck Institute) · Lingjie Liu (Saarland Informatics Campus, Max-Planck Institute) · Kripasindhu Sarkar (Google) · Danhang Tang (Google Inc.) · Thabo Beeler (Google) · Christian Theobalt (MPI Informatik)
MMA: Multi-Modal Adapter for Vision-Language Models
Lingxiao Yang (SUN YAT-SEN UNIVERSITY) · Ru-Yuan Zhang (None) · Yanchen Wang (Stanford University) · Xiaohua Xie (SUN YAT-SEN UNIVERSITY)
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment
Zheren Fu (University of Science and Technology of China) · Lei Zhang (University of Science and Technology of China) · Hou Xia (University of science and technology of China) · Zhendong Mao (None)
Blind Image Quality Assessment Based on Geometric Order Learning
Nyeong-Ho Shin (None) · Seon-Ho Lee (None) · Chang-Su Kim (Korea University)
Unsupervised Deep Unrolling Networks for Phase Unwrapping
Zhile Chen (South China University of Technology) · Yuhui Quan (South China University of Technology) · Hui Ji (National University of Singapore)
Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen (Osaka University) · Yusuke Hirota (Osaka University) · Mayu Otani (None) · Noa Garcia (Osaka University) · Yuta Nakashima (Osaka University)
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Kiana Ehsani (Allen Institute for Artificial Intelligence) · Tanmay Gupta (Allen Institute for Artificial Intelligence) · Rose Hendrix (Allen Institute for Artificial Intelligence) · Jordi Salvador (Allen Institute for AI) · Luca Weihs (Allen Institute for Artificial Intelligence) · Kuo-Hao Zeng (Allen Institute for Artificial Intelligence) · Kunal Singh Singh (None) · Yejin Kim (Allen Institute for Artificial Intelligence) · Winson Han (Allen Institute for Artificial Intelligence) · Alvaro Herrasti (Allen Institute for Artificial Intelligence) · Ranjay Krishna (University of Washington) · Dustin Schwenk (Allen Institute for Artificial Intelligence) · Eli VanderBilt (University of Idaho) · Aniruddha Kembhavi (Allen Institute for Artificial Intelligence)
What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation
Yihua Cheng (University of Birmingham) · Yaning Zhu (Huazhong University of Science and Technology) · Zongji Wang (Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute, Chinese Academy of Sciences) · hongquan hao (calmcar) · Liu wei (Jiangsu University of Science and Technology) · Shiqing Cheng (Zhejiang University) · Xi Wang (CalmCar Vision System) · Hyung Jin Chang (University of Birmingham)
HUGS: Human Gaussian Splatting
Muhammed Kocabas (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Jen-Hao Rick Chang (Apple) · James Gabriel (Apple) · Oncel Tuzel (Apple) · Anurag Ranjan (Apple)
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis
Shunyuan Zheng (Harbin Institute of Technology) · Boyao ZHOU (Tsinghua University) · Ruizhi Shao (Tsinghua University, Tsinghua University) · Boning Liu (Department of Automation, Tsinghua University) · Shengping Zhang (Harbin Institute of Technology) · Liqiang Nie (Harbin Institute of Technology (Shenzhen)) · Yebin Liu (Tsinghua University)
Commonsense Prototype for Outdoor Unsupervised 3D Object Detection
Hai Wu (Xiamen University) · Shijia Zhao (Xiamen University) · Xun Huang (Xiamen University) · Chenglu Wen (Xiamen University) · Xin Li (None) · Cheng Wang (Xiamen University)
Rapid Motor Adaptation for Robotic Manipulator Arms
Yichao Liang (None) · Kevin Ellis (Cornell University) · João F. Henriques (University of Oxford)
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Junwen Xiong (Northwestern Polytechnical University) · Peng Zhang (Northwest Polytechnical University Xi'an) · Tao You (Northwest Polytechnical University Xi'an) · Chuanyue Li (Northwestern Polytechnical University, Northwest Polytechnical University Xi'an) · Wei Huang (Nanchang University) · Yufei Zha (Northwestern Polytechinical University)
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Yang Chen (HiDream.ai) · Yingwei Pan (HiDream.ai) · haibo yang (Fudan University) · Ting Yao (JD AI Research) · Tao Mei (JD Explore Academy)
SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation
Kejia Yin (None) · Varshanth Rao (ModiFace - A L'Oreal Group Company) · Ruowei Jiang (ModiFace) · Xudong Liu (None) · Parham Aarabi (Toronto University) · David B. Lindell (University of Toronto)
TurboSL: Dense, Accurate and Fast 3D by Neural Inverse Structured Light
Parsa Mirdehghan (University of Toronto) · Maxx Wu (University of Toronto) · Wenzheng Chen (University of Toronto) · David B. Lindell (University of Toronto) · Kiriakos Kutulakos (University of Toronto)
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin (City University of Hong Kong) · Haoli Bai (Huawei Technologies Ltd.) · Zhili Liu (Hong Kong University of Science and Technology) · Lu Hou (Huawei Technologies Ltd.) · Muyi Sun (Institute of automation, Chinese Academy of Sciences) · Linqi Song (City University of Hong Kong) · Ying Wei (City University of Hong Kong) · Zhenan Sun (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation
Dong Lao (University of California, Los Angeles) · Congli Wang (University of California, Berkeley) · Alex Wong (Yale University) · Stefano Soatto (UCLA)
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction
Yi Xu (Northeastern University) · Yun Fu (Northeastern University)
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni (Tsinghua University) · Yulin Wang (Tsinghua University, Tsinghua University) · Renping Zhou (, Tsinghua University) · Jiayi Guo (Tsinghua University, Tsinghua University) · Jinyi Hu (Tsinghua University, Tsinghua University) · Zhiyuan Liu (Tsinghua University) · Shiji Song (Tsinghua University, Tsinghua University) · Yuan Yao (Tsinghua University) · Gao Huang (Tsinghua University, Tsinghua University)
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Huancheng Chen (University of Texas at Austin) · Haris Vikalo (University of Texas, Austin)
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou (None) · You Li (Zhejiang University) · Fan Ma (None) · Xiaoting Zhang (Huawei Technologies Ltd.) · Yi Yang (Zhejiang University)
CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification
Yuanmin Huang (Fudan University) · Mi Zhang (Fudan University) · Daizong Ding (Huawei Technologies Ltd.) · Erling Jiang (Fudan University) · Zhaoxiang Wang (Fudan University) · Min Yang (Fudan University)
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
Chenyang Wang (Harbin Institute of Technology) · Zerong Zheng (Tsinghua University) · Tao Yu (Tsinghua University, Tsinghua University) · Xiaoqian Lv (Harbin Institute of Technology) · Bineng Zhong (Guangxi Normal University) · Shengping Zhang (Harbin Institute of Technology) · Liqiang Nie (Harbin Institute of Technology (Shenzhen))
LiSA: LiDAR Localization with Semantic Awareness
Bochun Yang (Xiamen University) · Zijun Li (Xiamen University) · Wen Li (schoold of informatics xiamen university) · zhipeng cai (Intel Labs) · Chenglu Wen (Xiamen University) · Yu Zang (Xiamen University) · Matthias Mueller (None) · Cheng Wang (Xiamen University)
Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
Mainak Singha (Indian Institute of Technology Bombay) · Ankit Jha (Indian Institute of Technology Bombay) · Shirsha Bose (Technische Universität München) · Ashwin Nair (Indian Institute of Science Education and Research Thiruvananthapuram) · Moloud Abdar (Deakin University) · Biplab Banerjee (IIT Bombay)
Tumor Micro-environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-slide Pathological Images.
WEI SHAO (Nanjing University of Aeronautics and Astronautics) · YangYang Shi (Nanjing University of Aeronautics and Astronautics) · Daoqiang Zhang (Nanjing University of Aeronautics and Astronautics) · Junjie Zhou (Nanjing University of Aeronautics and Astronautics) · Peng Wan (Nanjing University of Aeronautics and Astronautics)
Diffusion-based Blind Text Image Super-Resolution
Yuzhe Zhang (None) · jiawei zhang (Sensetime) · Hao Li (Beihang University) · Zhouxia Wang (Nanyang Technological University) · Luwei Hou (Beihang University) · Dongqing Zou (Sensetime Research) · Liheng Bian (Beijing Institute of Technology)
Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution
Longguang Wang (National University of Defense Technology) · Juncheng Li (Shanghai University) · Yingqian Wang (None) · Qingyong Hu (University of Oxford) · Yulan Guo (SUN YAT-SEN UNIVERSITY)
FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models
Ao Luo (Megvii Technology Inc.) · XIN LI (G42) · Fan Yang (AIQ) · Jiangyu Liu (Megvii Technology Inc.) · Haoqiang Fan (Megvii Technology Inc.) · Shuaicheng Liu (None)
Rethinking Human Motion Prediction with Symplectic Integral
Haipeng Chen (Jilin University) · Kedi L yu (None) · Zhenguang Liu (Zhejiang University) · Yifang Yin (I2R, A*STAR) · Xun Yang (University of Science and Technology of China) · Yingda Lyu (Jilin University)
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Yue Yang (University of Pennsylvania) · Fan-Yun Sun (None) · Luca Weihs (Allen Institute for Artificial Intelligence) · Eli VanderBilt (University of Idaho) · Alvaro Herrasti (Allen Institute for Artificial Intelligence) · Winson Han (Allen Institute for Artificial Intelligence) · Jiajun Wu (Stanford University) · Nick Haber (Stanford University) · Ranjay Krishna (University of Washington) · Lingjie Liu (Saarland Informatics Campus, Max-Planck Institute) · Chris Callison-Burch (University of Pennsylvania) · Mark Yatskar (Department of Computer and Information Science, School of Engineering and Applied Science) · Aniruddha Kembhavi (Allen Institute for Artificial Intelligence) · Christopher Clark (None)
Unleashing Network Potentials for Semantic Scene Completion
Fengyun Wang (None) · Qianru Sun (None) · Dong Zhang (The Hong Kong University of Science and Technology) · Jinhui Tang (Nanjing University of Science and Technology)
AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring
Xintian Mao (East China Normal University) · Xiwen Gao (East China Normal University) · Yan Wang (East China Normal University)
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Xingyu Zhou (University of Electronic Science and Technology of China) · Leheng Zhang (University of Electronic Science and Technology of China) · Xiaorui Zhao (None) · Keze Wang (SUN YAT-SEN UNIVERSITY) · Leida Li (Xidian University) · Shuhang Gu (University of Electronic Science and Technology of China)
Fully Geometric Panoramic Localization
Junho Kim (Seoul National University) · Jiwon Jeong (Stanford University) · Young Min Kim (Seoul National University)
BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image
Minje Kim (KAIST) · Tae-Kyun Kim (Imperial College London)
Towards Robust 3D Pose Transfer with Adversarial Learning
Haoyu Chen (University of Oulu) · Hao Tang (ETH Zurich and CMU) · Ehsan Adeli (Stanford University) · Guoying Zhao (None)
Building Vision-Language Models on Solid Foundations with Masked Distillation
Sepehr Sameni (University of Bern) · Kushal Kafle (Adobe Systems) · Hao Tan (Adobe Systems) · Simon Jenni (Adobe Systems)
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Jeongsoo Choi (Korea Advanced Institute of Science and Technology) · Se Jin Park (KAIST) · Minsu Kim (None) · Yong Man Ro (Korea Advanced Institute of Science and Technology)
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong (Tsinghua University) · Weihan Wang (Tsinghua University, Tsinghua University) · Qingsong Lv (Tsinghua University, Tsinghua University) · Jiazheng Xu (, Tsinghua University) · Wenmeng Yu (None) · Junhui Ji (Zhipu.AI) · Yan Wang (Zhipu AI) · Zihan Wang (Tsinghua University, Tsinghua University) · Yuxiao Dong (Tsinghua University) · Ming Ding (ZHIPU AI) · Jie Tang (Tsinghua University, Tsinghua University)
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang (The University of Hong Kong) · Bingyi Kang (TikTok) · Zilong Huang (Tencent GY Lab) · Xiaogang Xu (Zhejiang Lab) · Jiashi Feng (ByteDance) · Hengshuang Zhao (The University of Hong Kong)
Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network
wenqiao Li (ShanghaiTech University) · Xiaohao Xu (University of Michigan - Ann Arbor) · Yao Gu (Shanghaitech University) · BoZhong Zheng (None) · Shenghua Gao (ShanghaiTech University) · Yingna Wu (ShanghaiTech University)
Discontinuity-preserving Normal Integration with Auxiliary Edges
Hyomin Kim (POSTECH) · Yucheol Jung (POSTECH) · Seungyong Lee (POSTECH)
Learning to navigate efficiently and precisely in real environments
Guillaume Bono (Naver Labs Europe) · Hervé Poirier (Naver Labs Europe) · Leonid Antsfeld (Naver Labs Europe) · Gianluca Monaci (Naver Labs Europe) · Boris Chidlovskii (Naver Labs Europe) · Christian Wolf (Naver Labs Europe)
PAPR in Motion: Seamless Point-level 3D Scene Interpolation
Shichong Peng (None) · Yanshu Zhang (None) · Ke Li (Simon Fraser University)
Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods
Chenfan Qu (South China University of Technology) · Yiwu Zhong (University of Wisconsin, Madison) · Chongyu Liu (South China University of Technology) · Guitao Xu (South China University of Technology) · Dezhi Peng (South China University of Technology) · Fengjun Guo (Shanghai Jiaotong University) · Lianwen Jin (South China University of Technology)
Dense Vision Transformer Compression with Few Samples
Hanxiao Zhang (Nanjing University) · Yifan Zhou (nanjing university) · Guo-Hua Wang (None)
Weakly Supervised Monocular 3D Detection with a Single-View Image
Xueying Jiang (Nanyang Technological University) · Sheng Jin (Nanyang Technological University) · Lewei Lu (SenseTime) · Xiaoqin Zhang (Wenzhou University) · Shijian Lu (Nanyang Technological University)
AM-RADIO: Agglomerative Models - Reduce All Domains Into One
Mike Ranzinger (NVIDIA Research) · Greg Heinrich (NVIDIA) · Jan Kautz (NVIDIA) · Pavlo Molchanov (NVIDIA)
Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Jinheng Xie (None) · Songhe Deng (None) · Bing Li (King Abdullah University of Science and Technology) · Haozhe Liu (King Abdullah University of Science and Technology) · Yawen Huang (None) · Yefeng Zheng (None) · Jürgen Schmidhuber (King Abdullah University of Science and Technology) · Bernard Ghanem (KAUST) · Linlin Shen (None) · Mike Zheng Shou (National University of Singapore)
LISA: Reasoning Segmentation via Large Language Model
Xin Lai (None) · Zhuotao Tian (The Chinese University of Hong Kong) · Yukang Chen (None) · Yanwei Li (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Yuhui Yuan (Microsoft Research Asia) · Shu Liu (The Chinese University of Hong Kong) · Jiaya Jia (The Chinese University of Hong Kong)
Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM
Pingping Zhang (Dalian University of Technology) · Tianyu Yan (Dalian University of Technology) · Yang Liu (Dalian University of Technology) · Huchuan Lu (Dalian University of Technology)
IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing
Shaofei Wang (None) · Bozidar Antic (Eberhard-Karls-Universität Tübingen) · Andreas Geiger (University of Tübingen) · Siyu Tang (ETH Zurich)
Exploring Pose-Aware Human-Object Interaction via Hybrid Learning
EASTMAN Z Y WU (Tsinghua University) · Yali Li (Tsinghua University) · Yuan Wang (None) · Shengjin Wang (Tsinghua University, Tsinghua University)
Multi-modal learning for geospatial vegetation forecasting
Vitus Benson (Max-Planck-Institute for Biogeochemistry) · Claire Robin (Max Planck Institute for Biogeochemistry) · Christian Requena-Mesa (Max-Planck Institute for Biogeochemistry) · LAZARO ALONSO SILVA (Max-Planck Institute) · Mélanie Weynants (Max Planck Institute for Biogeochemistry) · Nora Linscheid (Max Planck Institute for Biogeochemistry) · Jose Cortes (Max-Planck Institute) · Zhihan Gao (The Hong Kong University of Science and Technology) · Nuno Carvalhais (Max-Planck Institute) · Markus Reichstein (Max-Planck Institute)
All in One Framework for Multimodal Re-identification in the Wild
He Li (Wuhan University) · Mang Ye (Wuhan University) · Ming Zhang (Guangzhou Urban Planning & Design Survey Research Institute) · Bo Du (Wuhan University)
Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
Chao Yi (Nanjing University) · Lu Ren (nanjing university) · De-Chuan Zhan (Nanjing University) · Han-Jia Ye (Nanjing University)
Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness
Guangzhi Wang (National University of Singapore) · Yangyang Guo (National University of Singapore) · Ziwei Xu (National University of Singapore) · Mohan Kankanhalli (National University of Singapore)
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald (University of Amsterdam) · Nimrod Barazani (University of Amsterdam) · Cees G. M. Snoek (University of Amsterdam) · Yuki Asano (University of Amsterdam)
MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
Honghua Chen (National Technological University) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY) · Xingang Pan (None)
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar (MBZUAI) · Muzammal Naseer (MBZUAI) · Rao Anwer (Mohamed bin Zayed University of Artificial Intelligence) · Salman Khan (Mohamed bin Zayed University of Artificial Intelligence) · Michael Felsberg (Linköping University) · Mubarak Shah (University of Central Florida) · Fahad Shahbaz Khan (MBZUAI; Linköping University)
TCP: Textual-based Class-aware Prompt tuning for Visual-Language Model
Hantao Yao (None) · Rui Zhang (None) · Changsheng Xu (None)
RMT: Retentive Networks Meet Vision Transformers
Qihang Fan (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Huaibo Huang (Institute of Automation, Chinese Academy of Sciences) · Mingrui Chen (Institute of Automation, Chinese Academy of Sciences (CASIA)) · Hongmin Liu (University of Science and Technology Beijing) · Ran He (None)
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
Lin Song (Tencent AI Lab) · Yukang Chen (None) · Shuai Yang (Hong Kong University of Science and Technology (Guangzhou)) · Xiaohan Ding (Tencent AI Lab) · Yixiao Ge (Tencent) · Ying-Cong Chen (The Hong Kong University of Science and Technology) · Ying Shan (Tencent)
Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
Zicheng Zhang (None) · RUOBING ZHENG (None) · Bonan Li (None) · Congying Han (University of Chinese Academy of Sciences) · Tianqi Li (Ant Group) · Meng Wang (Ant Group) · Tiande Guo (University of the Chinese Academy of Sciences) · Jingdong Chen (Ant Group) · Ziwen Liu (University of the Chinese Academy of Sciences) · Ming Yang (Ant Group)
PairDETR : Joint Detection and Association of Human Bodies and Faces
Ammar Ali (ITMO University) · Georgii Gaikov (MTS AI) · Denis Rybalchenko (VisionLabs) · Alexander Chigorin (VisionLabs MENA) · Ivan Laptev (INRIA Paris) · Sergey Zagoruyko (MTS AI)
Language Models as Black-Box Optimizers for Vision-Language Models
Shihong Liu (Carnegie Mellon University) · Samuel Yu (Carnegie Mellon University) · Zhiqiu Lin (Carnegie Mellon University) · Deepak Pathak (Carnegie Mellon University) · Deva Ramanan (Carnegie Mellon University)
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
Abdullah J Hamdi (University of Oxford) · Luke Melas-Kyriazi (VGG, University of Oxford) · Jinjie Mai (KAUST) · Guocheng Qian (KAUST) · Ruoshi Liu (Columbia University) · Carl Vondrick (Columbia University) · Bernard Ghanem (KAUST) · Andrea Vedaldi (University of Oxford)
Steerers: A framework for rotation equivariant keypoint descriptors
Georg Bökman (Chalmers University of Technology) · Johan Edstedt (Computer Vision Laboratory, Linköping University) · Michael Felsberg (Linköping University) · Fredrik Kahl (Chalmers University)
On the Faithfulness of Vision Transformer Explanations
Junyi Wu (None) · Weitai Kang (None) · Hao Tang (ETH Zurich and CMU) · Yuan Hong (University of Connecticut) · Yan Yan (Illinois Institute of Technology)
Learning Transferable Negative Prompts for Out-of-Distribution Detection
Tianqi Li (Beihang University) · Guansong Pang (Singapore Management University) · wenjun miao (None) · Xiao Bai (Beijing University of Aeronautics and Astronautics) · Jin Zheng (Beijing University of Aeronautics and Astronautics)
3D Multi-frame Fusion for Video Stabilization
Zhan Peng (None) · Xinyi Ye (School of Artificial Intelligence and Automation, Huazhong University of Science and Technology) · Weiyue Zhao (Shenzhen Dajiang Innovation Technology Co., Ltd) · TIANQI LIU (None) · Huiqiang Sun (None) · Baopu Li (Baidu) · Zhiguo Cao ()
Fun with Flags: Robust Principal Directions via Flag Manifolds
Tolga Birdal (Imperial College London) · Nathan Mankovich (University of Valencia)
Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss
Jaeha Kim (Seoul National University) · Junghun Oh (None) · Kyoung Mu Lee (Seoul National University)
Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement
Kangmin Xu (Wuhan University) · Liang Liao (Nanyang Technological University) · Jing Xiao (Wuhan University) · Chaofeng Chen (Nanyang Technological University) · Haoning Wu (Nanyang Technological University) · Qiong Yan (SenseTime Research) · Weisi Lin (Nanyang Technological University)
COLMAP-Free 3D Gaussian Splatting
Yang Fu (University of California San Diego) · Sifei Liu (NVIDIA) · Amey Kulkarni (NVIDIA) · Jan Kautz (NVIDIA) · Alexei A. Efros (UC Berkeley) · Xiaolong Wang (UCSD)
Towards Realistic Scene Generation with LiDAR Diffusion Models
Haoxi Ran (Carnegie Mellon University) · Vitor Guizilini (Toyota Research Institute) · Yue Wang (Massachusetts Institute of Technology)
Point-VOS: Pointing Up Video Object Segmentation
Sabarinath Mahadevan (RWTH Aachen University) · Idil Esen Zulfikar (RWTH Aachen University) · Paul Voigtlaender (None) · Bastian Leibe (RWTH Aachen University)
Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion
Linzhan Mou (Zhejiang University) · Jun-Kun Chen (None) · Yu-Xiong Wang (None)
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
Desai Xie (Stony Brook University) · Jiahao Li (Toyota Technological Institute at Chicago) · Hao Tan (Adobe Systems) · Xin Sun (Adobe Systems) · Zhixin Shu (Adobe Systems) · Yi Zhou (Adobe Systems) · Sai Bi (Adobe Systems) · Soeren Pirk (Adobe) · ARIE KAUFMAN (Stony Brook University)
Exploring Orthogonality in Open World Object Detection
Zhicheng Sun (Peking University) · Jinghan Li (Peking University) · Yadong Mu (Peking University)
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
Chanyoung Kim (Yonsei University) · Woojung Han (Yonsei University) · Dayun Ju (Yonsei University) · Seong Jae Hwang (Yonsei University)
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Chancharik Mitra (University of California, Berkeley) · Brandon Huang (University of California, Berkeley) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Roei Herzig (Tel Aviv University)
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
Seungwoo Yoo (Korea Advanced Institute of Science and Technology (KAIST)) · Kunho Kim (Korea Advanced Institute of Science & Technology) · Vladimir G. Kim (Adobe Systems) · Minhyuk Sung (KAIST)
Unifying Automatic and Interactive Matting with Pretrained ViTs
Zixuan Ye (None) · Wenze Liu (Huazhong University of Science and Technology) · He Guo () · Yujia Liang (Huazhong University of Science and Technology) · Chaoyi Hong (Huazhong University of Science and Technology) · Hao Lu (Huazhong University of Science and Technology) · Zhiguo Cao ()
Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts
Qin Liu (Department of Computer Science, University of North Carolina, Chapel Hill) · Jaemin Cho (UNC Chapel Hill) · Mohit Bansal (University of North Carolina at Chapel Hill) · Marc Niethammer (The University of North Carolina at Chapel Hill)
NViST: In the Wild New View Synthesis from a Single Image with Transformers
Wonbong Jang (University College London) · Lourdes Agapito (University College London)
Authentic Hand Avatar from a Phone Scan via Universal Hand Model
Gyeongsik Moon (None) · Weipeng Xu (Meta Reality Labs Research) · Rohan Joshi (Facebook) · Chenglei Wu (Meta) · Takaaki Shiratori (Meta Reality Labs Research)
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge (University of Southern California) · Xiaohui Zeng (Department of Computer Science, University of Toronto) · Jacob Huffman (NVIDIA) · Tsung-Yi Lin (NVIDIA) · Ming-Yu Liu (NVIDIA) · Yin Cui (NVIDIA)
Latency Correction for Event-guided Deblurring and Frame Interpolation
Yixin Yang (Peking University) · Jinxiu Liang (Peking University) · Bohan Yu (None) · Yan Chen (Tsinghua University, Tsinghua University) · Jimmy S. Ren (SenseTime Research) · Boxin Shi (Peking University)
ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images
Yiqi Shi (Harbin Engineering University) · Duo Liu (Harbin Engineering University) · Liguo Zhang (Harbin Engineering University) · Ye Tian (Xidian University) · Xuezhi Xia (Harbin Engineering University) · fuxiaojing (Harbin Engineering University)
HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection
Qiming Xia (XMU) · Wei Ye (Xiamen University) · Hai Wu (Xiamen University) · Shijia Zhao (Xiamen University) · Leyuan Xing (Xiamen University) · Xun Huang (Xiamen University) · Jinhao Deng (Xiamen University) · Xin Li (None) · Chenglu Wen (Xiamen University) · Cheng Wang (Xiamen University)
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Hyeongjun Kwon (None) · Jinhyun Jang (Yonsei University) · Jin Kim (Yonsei University, Seoul, South Korea) · Kwonyoung Kim (None) · Kwanghoon Sohn (Yonsei University)
Self-supervised Representation Learning from Arbitrary Scenarios
Zhaowen Li (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Yousong Zhu (Institute of Automation, Chinese Academy of Sciences) · Zhiyang Chen (Institute of automation, Chinese academy of science) · Zongxin Gao (Beijing Institute Of Graphic Communication) · Rui Zhao (Qing Yuan Research Institute, Shanghai Jiao Tong University) · Chaoyang Zhao (, Institute of automation, Chinese academy of science) · Ming Tang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Jinqiao Wang (Institute of Automation, Chinese Academy of Sciences)
NEAT: Distilling 3D Wireframes from Neural Attraction Fields
Nan Xue (Ant Group) · Bin Tan (Wuhan University) · Yuxi Xiao (Zhejiang University) · Liang Dong (Google) · Gui-Song Xia (Wuhan University) · Tianfu Wu () · Yujun Shen (The Chinese University of Hong Kong)
FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization
Shuai Tan (Shanghai Jiaotong University) · Bin Ji (Shanghai Jiaotong University) · Ye Pan (Shanghai Jiaotong University)
Generating Content for HDR Deghosting from Frequency View
Tao Hu (Northwestern Polytechnical University) · Qingsen Yan (Northwest Polytechnical University Xi'an) · Yuankai Qi (The University of Adelaide) · Yanning Zhang (Northwestern Polytechnical University)
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey Gritsenko (Google) · Xuehan Xiong (Google) · Josip Djolonga (Google) · Mostafa Dehghani (Google DeepMind) · Chen Sun (Brown University) · Mario Lučić (Google) · Cordelia Schmid (Inria / Google) · Anurag Arnab (Google)
Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model
Tian Liang (None) · Jing Huang (Zhejiang University) · Ming Kong (None) · Luyuan Chen (Beijing Information Science and Technology University) · Qiang Zhu (Zhejiang University)
Dual Prototype Attention for Unsupervised Video Object Segmentation
Suhwan Cho (Yonsei University) · Minhyeok Lee (Yonsei University) · Seunghoon Lee (Yonsei University) · Dogyoon Lee (Yonsei University) · Heeseung Choi (None) · Ig-Jae Kim (Korea Institute of Science and Technology) · Sangyoun Lee (Yonsei University)
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja (BITS Pilani, Birla Institute of Technology and Science) · Muhammad Sohail Danish (Mohamed bin Zayed University of Artificial Intelligence) · Muzammal Naseer (MBZUAI) · Abhijit Das (BITS Pilani, Birla Institute of Technology and Science) · Salman Khan (Mohamed bin Zayed University of Artificial Intelligence) · Fahad Shahbaz Khan (MBZUAI; Linköping University)
Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
Zhiwei Yang (Guangzhou Institute of Technology, Xidian University) · Jing Liu (Guangzhou Institute of Technology, Xidian University) · Peng Wu (Northwest Polytechnical University Xi'an)
AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings
Jamie Watson (Niantic) · Filippo Aleotti (Niantic, Inc.) · Mohamed Sayed (University College London, University of London) · Zawar Qureshi (None) · Oisin Mac Aodha (University of Edinburgh) · Gabriel J. Brostow (Department of Computer Science, University College London) · Michael Firman (Niantic, Inc.) · Sara Vicente (Niantic)
Prompt Learning via Meta-Regularization
Jinyoung Park (Korea University) · Juyeon Ko (Korea University) · Hyunwoo J. Kim (Korea University)
Addressing Background Context Bias in Few-Shot Segmentation through Iterative Modulation
Lanyun Zhu (Singapore University of Technology and Design) · Tianrun Chen (Zhejiang University) · Jianxiong Yin (NVIDIA) · Simon See (NVIDIA) · Jun Liu (Singapore University of Technology and Design (SUTD))
Rethinking the Region Classification in Open-Vocabulary Semantic Segmentation: An Image-to-Image View
Yuan Wang (University of Science and Technology of China) · Rui Sun (University of Science and Technology of China) · Naisong Luo (University of Science and Technology of China) · Yuwen Pan (University of Science and Technology of China) · Tianzhu Zhang (University of Science and Technology of China)
Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis
Yiyang Chen (South China University of Technology) · Lunhao Duan (Wuhan University) · Shanshan Zhao (JD Explore Academy) · Changxing Ding (South China University of Technology) · Dacheng Tao (None)
KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation
Fengyuan Yang (National University of Singapore) · Kerui Gu (National University of Singapore) · Angela Yao (National University of Singapore)
SuperNormal: Neural Surface Reconstruction via Multi-View Normal Integration
Xu Cao (Osaka University) · Takafumi Taketomi (CyberAgent)
Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super-Resolution
Hongjun Wang (None) · Jiyuan Chen (Hong Kong Polytechnic University) · Yinqiang Zheng (None) · Tieyong Zeng (The Chinese University of Hong Kong)
Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation
Xiaoyang Wang () · Huihui Bai (Beijing jiaotong university) · Limin Yu (Xi'an Jiaotong-Liverpool University) · Yao Zhao (Beijing Jiaotong University) · Jimin Xiao (Xi'an Jiaotong-Liverpool University)
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu (Huazhong University of Science and Technology) · Yi Jiang (bytedance) · Qihao Liu (Johns Hopkins University) · Zehuan Yuan (Nanjing University) · Xiang Bai (Huazhong University of Science and Technology) · Song Bai (ByteDance)
Friendly Sharpness-Aware Minimization
Tao Li (Shanghai Jiao Tong University) · Pan Zhou (Sea Group) · Zhengbao He (Department of Automation, Shanghai Jiao Tong University) · Xinwen Cheng (Shanghai Jiaotong University) · Xiaolin Huang (Shanghai Jiao Tong University, Tsinghua University)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch
Xidong Wu (University of Pittsburgh) · Shangqian Gao (University of Pittsburgh) · Zeyu Zhang (Amazon AGI) · Zhenzhen Li (Bosch) · Runxue Bao (GE Healthcare) · Yanfu Zhang (College of William and Mary) · Xiaoqian Wang (Purdue University) · Heng Huang (University of Pittsburgh)
SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image
Yunhao Li (Zhejiang University) · Xiaodong Wang (Zhejiang University) · Ping Wang (Zhejiang University) · Xin Yuan (Westlake University) · Peidong Liu (None)
Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction
Jianping Jiang (Peking University) · xinyu zhou (Peking University) · Bingxuan Wang (Peking University) · Xiaoming Deng (Institute of Software Chinese Academy of Sciences) · Chao Xu (Peking University) · Boxin Shi (Peking University)
Emotional Speech-Driven 3D Body Animation via Disentangled Latent Diffusion
Kiran Chhatre (KTH Royal Institute of Technology) · Radek Danecek (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Nikos Athanasiou (None) · Giorgio Becherini (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Christopher Peters (KTH Royal Institute of Technology) · Michael J. Black (University of Tübingen) · Timo Bolkart (Google)
Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Xiao Zhang (University of Chicago) · David Yunis (Toyota Technological Institute at Chicago) · Michael Maire (None)
Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification
Kunlun Xu (Peking University) · Xu Zou (Huazhong University of Science and Technology) · Yuxin Peng (Peking University) · Jiahuan Zhou (Peking University)
KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation
Jihua Peng (The Hong Kong Polytechnic University) · Yanghong Zhou (The Hong Kong Polytechnic University) · Tracy P Y Mok (The Hong Kong Polytechnic University)
Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
Zaid Khan (Northeastern University) · Yun Fu (Northeastern University)
Optimal Transport Aggregation for Visual Place Recognition
Sergio Izquierdo (I3A - University of Zaragoza) · Javier Civera (I3A, Universidad de Zaragoza)
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
Hongyu Zhou (Zhejiang University) · Jiahao Shao (Zhejiang University) · Lu Xu (Zhejiang University) · Dongfeng Bai (Huawei Technologies Ltd.) · Weichao Qiu (Huawei Technologies Ltd.) · Bingbing Liu (Huawei Technologies Ltd.) · Yue Wang (Zhejiang University) · Andreas Geiger (University of Tübingen) · Yiyi Liao (Zhejiang University)
Human Motion Prediction under Unexpected Perturbation
Jiangbei Yue (University of Leeds) · Baiyi Li (University of Leeds) · Julien Pettré (INRIA) · Armin Seyfried (Forschungszentrum Jülich) · He Wang (None)
LLM-AR: When Large Language Model Meets Skeleton-Based Action Recognition
Haoxuan Qu (Singapore University of Technology and Design) · Yujun Cai (Meta) · Jun Liu (Singapore University of Technology and Design (SUTD))
MFP: Making Full use of Probability Maps for Interactive Image Segmentation
Chaewon Lee (None) · Seon-Ho Lee (None) · Chang-Su Kim (Korea University)
Instantaneous Perception of Moving Objects in 3D
Di Liu (Rutgers University, New Brunswick) · Bingbing Zhuang (NEC Labs America) · Dimitris N. Metaxas (Rutgers) · Manmohan Chandraker (UC San Diego)
Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Jiahao Nie (Nanyang Technological University) · Yun Xing (Nanyang Technological University) · Gongjie Zhang (Black Sesame Tech.) · Pei Yan (Huazhong University of Science and Technology) · Aoran Xiao (Nanyang Technological University) · Yap-peng Tan (Nanyang Technological University) · Alex C. Kot (Nanyang Technological University) · Shijian Lu (Nanyang Technological University)
Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning
Zhengwei Fang (Beijing Jiaotong University) · Rui Wang (Beijing Jiaotong University) · Tao Huang (Beijing Jiaotong University) · Liping Jing (Beijing Jiaotong University)
Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
Junjiao Tian (Georgia Institute of Technology) · Lavisha Aggarwal (Amazon) · Andrea Colaco (Google) · Zsolt Kira (Georgia Institute of Technology) · Mar Gonzalez-Franco (Google)
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Runze He (Institute of Information Engineering, Chinese Academy of Sciences) · Shaofei Huang (Institute of Information Engineering, Chinese Academy of Sciences) · Xuecheng Nie (national university of singaore, National University of Singapore) · Tianrui Hui (Hefei University of Technology) · Luoqi Liu (None) · Jiao Dai (Institute of Information Engineering,Chinese Academy of Sciences) · Jizhong Han (Institute of Information Engineering) · Guanbin Li (Sun Yat-sen University) · Si Liu (Beihang University)
Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction
Yizhi Wang (Simon Fraser University) · Wallace Lira (Simon Fraser University) · Wenqi Wang (Computing Science, Simon Fraser University) · Ali Mahdavi Amiri (Simon Fraser University) · Hao Zhang (Simon Fraser University)
Learning to Produce Semi-dense Correspondences for Visual Localization
Khang Truong Giang (Korea Advanced Institute of Science and Technology) · Soohwan Song (Dongguk University) · Sungho Jo (Korea Advanced Institute of Science & Technology)
Differentiable Neural Surface Refinement for Transparent Objects
Weijian Deng (The Australian National University) · Dylan Campbell (Australian National University) · Chunyi Sun (Australian National University) · Shubham Kanitkar (RIOS Intelligent Machines) · Matthew Shaffer (RIOS Intelligent Machines) · Stephen Gould (Australian National University)
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Yunyang Xiong (Facebook) · Balakrishnan Varadarajan (Meta) · Lemeng Wu (University of Texas, Austin) · Xiaoyu Xiang (Meta) · Fanyi Xiao (Meta) · Chenchen Zhu (Meta AI) · Xiaoliang Dai (Facebook) · Dilin Wang (Facebook) · Fei Sun (Meta Inc.) · Forrest Iandola (Meta) · Raghuraman Krishnamoorthi (Facebook) · Vikas Chandra (Facebook)
Look-Up Table Compression for Efficient Image Restoration
Yinglong Li (University of Science and Technology of China) · Jiacheng Li (None) · Zhiwei Xiong (None)
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Wenhao Li (Peking University) · Mengyuan Liu (SUN YAT-SEN UNIVERSITY) · Hong Liu (Peking University) · Pichao Wang (Amazon) · Jialun Cai (peking university) · Nicu Sebe (University of Trento)
RepAn: Enhanced Annealing through Re-parameterization
Xiang Fei (School of Informatics, Xiamen University) · Xiawu Zheng (Xiamen University) · Yan Wang (Samsara) · Fei Chao (Xiamen University) · Chenglin Wu (DeepWisdom) · Liujuan Cao (Xiamen University)
Vector Graphics Generation via Mutually Impulsed Dual-domain Diffusion
Zhongyin Zhao (Shanghai Jiaotong University) · Ye Chen (Shanghai Jiao Tong University) · Zhangli Hu (Shanghai Jiaotong University) · Xuanhong Chen (Shanghai Jiao Tong University) · Bingbing Ni (Shanghai Jiao Tong University)
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
You Huang (Xiamen University) · Zongyu Lan (Xiamen University) · Liujuan Cao (Xiamen University) · Xianming Lin () · Shengchuan Zhang (None) · Guannan Jiang (Contemporary Amperex Technology Co., Limited) · Rongrong Ji (Xiamen University)
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Yanwu Xu (Boston University, Boston University) · Yang Zhao (Google) · Zhisheng Xiao (Google) · Tingbo Hou (Google Research)
Understanding and Improving Source-free Domain Adaptation from a Theoretical Perspective
Yu Mitsuzumi (None) · Akisato Kimura (NTT Corporation) · Hisashi Kashima (Kyoto University)
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Chen Duan (Meituan) · Pei Fu (meituan) · Shan Guo (Meituan) · Qianyi Jiang (meituan) · Xiaoming Wei (Meituan)
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow
Chenbin Pan (Syracuse University) · Burhaneddin Yaman (Bosch Center for Artificial Intelligence) · Senem Velipasalar (Syracuse University) · Liu Ren (Bosch Research)
Hyperspherical Classification with Dynamic Label-to-Prototype Assignment
Mohammad Saadabadi Saadabadi (None) · Ali Dabouei (None) · Sahar Rahimi Malakshan (West Virginia University) · Nasser Nasrabadi (West Virginia University)
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
Jiawen Zhu (Singapore Management University) · Guansong Pang (Singapore Management University)
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
Syed Talal Wasim (None) · Muzammal Naseer (MBZUAI) · Salman Khan (Mohamed bin Zayed University of Artificial Intelligence) · Ming-Hsuan Yang (University of California at Merced) · Fahad Shahbaz Khan (MBZUAI; Linköping University)
ODIN: A Single Model for 2D and 3D Segmentation
Ayush Jain (Carnegie Mellon University) · Pushkal Katara (Carnegie Mellon University) · Nikolaos Gkanatsios (Carnegie Mellon University) · Adam Harley (Ryerson University) · Gabriel Sarch (Carnegie Mellon University) · Kriti Aggarwal (Microsoft) · Vishrav Chaudhary (Microsoft) · Katerina Fragkiadaki (CMU)
Prompt Augmentation for Self-supervised Text-guided Image Manipulation
Rumeysa Bodur (Imperial College London) · Binod Bhattarai (University of Aberdeen) · Tae-Kyun Kim (Imperial College London)
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Yong Liu (None) · Sule Bai (Tsinghua University, Tsinghua University) · Guanbin Li (Sun Yat-sen University) · Yitong Wang (ByteDance Inc) · Yansong Tang (Tsinghua University)
FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models
Jinglin Xu (University of Science and Technology Beijing) · Yijie Guo (Peking University) · Yuxin Peng (Peking University)
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
Mi Yan (Peking University) · Jiazhao Zhang (None) · Yan Zhu (Peking University) · He Wang (None)
MemoNav: Working Memory Model for Visual Navigation
Hongxin Li (Institute of Automation, Chinese Academy of Sciences) · Zeyu Wang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xu Yang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · yuran Yang (Tencent) · Shuqi Mei (Tencent T-Lab) · Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
PointBeV: A Sparse Approach for BeV Predictions
Loick Chambon (Valeo) · Éloi Zablocki (Valeo) · Mickaël Chen (Valeo) · Florent Bartoccioni (Valeo) · Patrick Pérez (None) · Matthieu Cord (None)
Ensemble Diversity Facilitates Adversarial Transferability
Bowen Tang (University of Electronic Science and Technology of China) · Zheng Wang (University of Electronic Science and Technology of China) · Yi Bin (National University of Singapore) · Qi Dou (The Chinese University of Hong Kong) · Yang Yang (University of Electronic Science and Technology of China) · Heng Tao Shen (University of Electronic Science and Technology of China)
POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning
Jiayi Guan (Tongji University) · Li Shen (JD Explore Academy) · Ao Zhou (Tongji University) · Lusong Li (JDT) · Han Hu (Beijing Institute of Technology) · Xiaodong He (JD AI Research) · Guang Chen (Tongji University) · Changjun Jiang (Tongji University)
Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification
kaijie ren (Chongqing University) · Lei Zhang (Chongqing University)
On the Content Bias in Fréchet Video Distance
Songwei Ge (University of Maryland, College Park) · Aniruddha Mahapatra (CMU, Carnegie Mellon University) · Gaurav Parmar (Carnegie Mellon University) · Jun-Yan Zhu (Carnegie Mellon University) · Jia-Bin Huang (University of Maryland, College Park)
Sheared Backpropagation for Finetuning Foundation Models
Zhiyuan Yu (None) · Li Shen (JD Explore Academy) · Liang Ding (Zhejiang University) · Xinmei Tian (University of Science and Technology of China) · Yixin Chen (Washington University, Saint Louis) · Dacheng Tao (None)
Hyperbolic Learning with Synthetic Captions for Open-World Detection
Fanjie Kong (Duke University) · Yanbei Chen (Amazon) · Jiarui Cai (Amazon AWS AI) · Davide Modolo (Amazon)
NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation
Jiahao Chen (SUN YAT-SEN UNIVERSITY) · Yipeng Qin (Cardiff University) · Lingjie Liu (Saarland Informatics Campus, Max-Planck Institute) · Jiangbo Lu (SmartMore Corporation) · Guanbin Li (Sun Yat-sen University)
In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
Yiran Xu (University of Maryland, College Park) · Zhixin Shu (Adobe Systems) · Cameron Smith (Adobe Systems) · Seoung Wug Oh (Adobe Systems) · Jia-Bin Huang (University of Maryland, College Park)
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu (Peking University) · Xiangtai Li (Nanyang Technological University) · Chenyang Si (Nanyang Technological University Singapore) · Shangchen Zhou (Nanyang Technological University) · Jingkang Yang (Nanyang Technological University) · Jiangning Zhang (Tencent Youtu Lab) · Yining Li (Shanghai AI Laboratory) · Kai Chen (Shanghai AI Laboratory) · Yunhai Tong (Peking University) · Ziwei Liu (Nanyang Technological University) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY)
From Activation to Initialization: Scaling Insights for Optimizing Neural Fields
Hemanth Saratchandran (University of Adelaide/Australian Institute of Machine Learning) · Sameera Ramasinghe (Amazon) · Simon Lucey (University of Adelaide)
High Fidelity Person-centric Subject-to-Image Synthesis
Yibin Wang (None) · Weizhong Zhang (Fudan University) · Jianwei Zheng (Zhejiang University of Technology) · Cheng Jin (Fudan University)
Fixed Point Diffusion Models
Luke Melas-Kyriazi (VGG, University of Oxford) · Xingjian Bai (University of Oxford)
Contextual Augmented Global Contrast for Multimodal Intent Recognition
Kaili Sun () · Zhiwen Xie (Central China Normal University) · Mang Ye (Wuhan University) · Huyin Zhang (Wuhan University)
SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System
Yunfei Fan (PICO, ByteDance) · Tianyu Zhao (Bytedance) · Guidong Wang (PICO)
MACE: Mass Concept Erasure in Diffusion Models
Shilin Lu (Nanyang Technological University) · Zilan Wang (Nanyang Technological University) · Leyang Li (Nanyang Technological University) · Yanzhu Liu (I2R, A*STAR) · Adams Wai-Kin Kong (Nanyang Technological University)
XFeat: Accelerated Features for Lightweight Image Matching
Guilherme Potje (Federal University of Minas Gerais) · Felipe Cadar (Universidade Federal de Minas Gerais, Universidade Federal de Minas Gerais) · André Araujo (Google Research) · Renato Martins (Université de Bourgogne) · Erickson R. Nascimento (Universidade Federal de Minas Gerais / Microsoft)
GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation
WEIMING ZHANG () · Yexin Liu (The Hong Kong University of Science and Technology) · Xu Zheng (HKUST) · Lin Wang (Hong Kong University of Science and Technology)
VRP-SAM: SAM with Visual Reference Prompt
Yanpeng Sun (Nanjing University of Science and Technology) · Jiahui Chen (Beihang University) · Shan Zhang (Australian National University) · Xinyu Zhang (None) · Qiang Chen (Baidu) · gang zhang (Baidu Inc.) · Errui Ding (Baidu Inc.) · Jingdong Wang (Baidu) · Zechao Li (Nanjing University of Science and Techonolgy)
VideoBooth: Diffusion-based Video Generation with Image Prompts
Yuming Jiang (Nanyang Technological University) · Tianxing Wu (Nanyang Technological University) · Shuai Yang (Nanyang Technological University) · Chenyang Si (Nanyang Technological University Singapore) · Dahua Lin (The Chinese University of Hong Kong) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY) · Ziwei Liu (Nanyang Technological University)
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
Yao Ni (Australian National University) · Piotr Koniusz (Data61/CSIRO + Australian National University)
Day-Night Cross-domain Vehicle Re-identification
Hongchao Li (Anhui Normal University) · Jingong Chen (Anhui Normal University) · AIHUA ZHENG (Anhui University) · Yong Wu (Anhui Normal University) · YongLong Luo (Anhui Normal University)
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
Jiapeng Tang (Technische Universität München) · Yinyu Nie (Huawei Technologies Ltd.) · Lev Markhasin (None) · Angela Dai () · Justus Thies (Max-Planck Institute for Intelligent Systems) · Matthias Nießner (Technical University of Munich)
SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training
WU Sitong (Department of Computer Science and Engineering, The Chinese University of Hong Kong) · Haoru Tan (HKU) · Zhuotao Tian (The Chinese University of Hong Kong) · Yukang Chen (None) · Xiaojuan Qi (University of Oxford) · Jiaya Jia (The Chinese University of Hong Kong)
StrokeFaceNeRF: Stroke-based Facial Appearance Editing in Neural Radiance Field
Xiao-juan Li (University of the Chinese Academy of Sciences) · Dingxi Zhang (University of Chinese Academy of Science) · Shu-Yu Chen (Chinese Academy of Sciences) · Feng-Lin Liu (Chinese Academy of Sciences)
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu (Apple) · Hadi Pouransari (Apple) · Fartash Faghri (None) · Raviteja Vemulapalli (None) · Oncel Tuzel (Apple)
Neural Modes: Self-supervised Learning of Nonlinear Modal Subspaces
Jiahong Wang (None) · Yinwei DU (Department of Computer Science, ETHZ - ETH Zurich) · Stelian Coros (ETHZ - ETH Zurich) · Bernhard Thomaszewski (Swiss Federal Institute of Technology)
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
Soyong Shin (None) · Juyong Kim (Carnegie Mellon University) · Eni Halilaj (Carnegie Mellon University) · Michael J. Black (University of Tübingen)
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong (New York University / Meta AI) · Zhuang Liu (FAIR, Meta AI) · Yuexiang Zhai (University of California Berkeley) · Yi Ma (UC Berkeley) · Yann LeCun (Facebook) · Saining Xie (Facebook)
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng (Huazhong University of Science and Technology) · Lin Song (Tencent AI Lab) · Yixiao Ge (Tencent) · Wenyu Liu (Huazhong University of Science and Technology) · Xinggang Wang (Huazhong University of Science and Technology) · Ying Shan (Tencent)
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment
Alireza Ganjdanesh (University of Maryland, College Park) · Shangqian Gao (University of Pittsburgh) · Heng Huang (University of Pittsburgh)
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
Yuanxun Lu (Nanjing University) · Jingyang Zhang (Apple) · Shiwei Li (Apple) · Tian Fang (Hong Kong University of Science and Technology) · David McKinnon (Apple) · Yanghai Tsin (Apple) · Long Quan (The Hong Kong University of Science and Technology) · Xun Cao (Nanjing University) · Yao Yao (Nanjing University)
Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs
Hugh Blayney (dRISK.ai) · Hanlin Tian (Imperial College London) · Hamish Scott (dRISK AI) · Nils Goldbeck (dRISK) · Chess Stetson (dRISK) · Panagiotis Angeloudis (Imperial College London)
FedUV: Uniformity and Variance for Heterogeneous Federated Learning
Ha Min Son (University of California, Davis) · Moon-Hyun Kim (Sungkyunkwan University) · Tai-Myoung Chung (Sung Kyun Kwan University) · Chao Huang (University of California, Davis) · Xin Liu (University of California, Davis)
Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes
Ziqian Bai (Simon Fraser University & Google) · Feitong Tan (Google) · Sean Fanello (Google) · Rohit Pandey (Google) · Mingsong Dou (Google) · Shichen Liu (Google) · Ping Tan (Hong Kong University of Science and Technology) · Yinda Zhang (Google)
FaceCom: Towards High-fidelity 3D Facial Shape Completion via Optimization and Inpainting Guidance
Yinglong Li (Beihang University) · Hongyu Wu (None) · Wang (None) · Qingzhao Qin (Peking University) · yijiao zhao (None) · Yong Wang (None) · Aimin Hao (None)
RankMatch: Exploring the Better Consistency Regularization for Semi-supervised Semantic Segmentation
Huayu Mai (University of Science and Technology of China) · Rui Sun (University of Science and Technology of China) · Tianzhu Zhang (University of Science and Technology of China) · Feng Wu (University of Science and Technology of China)
Revisiting Adversarial Training under Long-Tailed Distributions
Xinli Yue (Wuhan University) · Ningping Mou (Wuhan University) · Qian Wang (Wuhan University) · Lingchen Zhao (Wuhan University)
From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation
Hyeokjun Kweon (KAIST) · Kuk-Jin Yoon (KAIST)
VINECS: Video-based Neural Character Skinning
Zhouyingcheng Liao (the University of Hong Kong, University of Hong Kong) · Vladislav Golyanik (MPI for Informatics) · Marc Habermann (Saarland Informatics Campus, Max-Planck Institute) · Christian Theobalt (MPI Informatik)
Plug and Play Active Learning for Object Detection
Chenhongyi Yang (University of Edinburgh, University of Edinburgh) · Lichao Huang (Horizon robotics ) · Elliot Crowley (University of Edinburgh)
Learning Structure-from-Motion with Graph Attention Networks
Lucas Brynte (None) · José Pedro Iglesias (None) · Carl Olsson (Lund University) · Fredrik Kahl (Chalmers University)
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
Ruicong Liu (The University of Tokyo) · Takehiko Ohkawa (The University of Tokyo) · Mingfang Zhang (None) · Yoichi Sato (University of Tokyo)
Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Rob Geada (University of Newcastle-upon-Tyne) · David Towers (Newcastle University) · Matthew Forshaw (Newcastle University, UK) · Amir Atapour-Abarghouei (Durham University) · Stephen McGough ()
Joint-Task Regularization for Partially Labeled Multi-Task Learning
Kento Nishi (Harvard University) · Junsik Kim (None) · Wanhua Li (Harvard University) · Hanspeter Pfister (Harvard University)
Mind Artist: Creating Artistic Snapshots with Human Thought
Jiaxuan Chen (Zhejiang University) · Yu Qi (Zhejiang University) · Yueming Wang (Zhejiang University) · Gang Pan (Zhejiang University)
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng (The Chinese University of Hong Kong) · Xiaoyang Wu (The University of Hong Kong) · Li Jiang (Max Planck Institute for Informatics) · Yukang Chen (None) · Hengshuang Zhao (The University of Hong Kong) · Zhuotao Tian (The Chinese University of Hong Kong) · Jiaya Jia (The Chinese University of Hong Kong)
$L_0$-Sampler: An $L_{0}$ Model Guided Volume Sampling for NeRF
Liangchen Li (University of Science and Technology of China) · Juyong Zhang (University of Science and Technology of China)
SAI3D: Segment Any Instance in 3D Scenes
Yingda Yin (None) · Yuzheng Liu (Peking University) · Yang Xiao (Huawei Technologies Ltd.) · Daniel Cohen-Or (Google) · Jingwei Huang (Huawei Technologies Ltd.) · Baoquan Chen (Peking University)
EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Priors
Zhipeng Hu (Leihuo Game, NetEase) · Minda Zhao (NetEase Fuxi AI Lab) · Chaoyi Zhao (Fuxi AI Lab, NetEase) · Xinyue Liang (nanjing university) · Lincheng Li () · Zeng Zhao (Fuxi AI Lab,NetEase, Inc.) · Changjie Fan (Netease, Fuxi AI Lab) · Xiaowei Zhou (None) · Xin Yu (University of Queensland)
Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features
Niladri Shekhar Dutt (University College London; Ready Player Me) · Sanjeev Muralikrishnan (University College London, University of London) · Niloy J. Mitra (University College London)
SGC-Occ: Semantic-Geometry Consistent 3D Occupancy Prediction for Autonomous Driving
Zhiwen Yang (Peking University) · Xiangteng He (Peking University) · Yuxin Peng (Peking University)
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang (Institute of Computing Technology, Chinese Academy of Sciences) · Jie Zhang (Institute of Computing Technology, Chinese Academy of Sciences) · Zheng Yuan (Institute of Computing Technology, Chinese Academy of Sciences) · Shiguang Shan (Institute of Computing Technology, Chinese Academy of Sciences)
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei (Johns Hopkins University) · Chenxi Liu (Waymo) · Siyuan Qiao (Google) · Zhishuai Zhang (Google Deepmind) · Alan L. Yuille (Johns Hopkins University) · Jiahui Yu (Google Brain)
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
Yanzuo Lu (SUN YAT-SEN UNIVERSITY) · Manlin Zhang (SUN YAT-SEN UNIVERSITY) · Jinhua Ma (SUN YAT-SEN UNIVERSITY) · Xiaohua Xie (SUN YAT-SEN UNIVERSITY) · Jianhuang Lai (SUN YAT-SEN UNIVERSITY)
Unsupervised Occupancy Learning from Sparse Point Cloud
Amine Ouasfi (INRIA) · Adnane Boukhayma (INRIA)
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM
Tongyan Hua (HKUST(GZ)) · Lin Wang (Hong Kong University of Science and Technology)
GLOW: Global Layout Aware Attacks on Object Detection
Jun Bao (Hangzhou Dianzi University) · Buyu Liu (NEC-Labs) · Kui Ren (Zhejiang University) · Jun Yu (Hangzhou Dianzi University)
DeepCache: Accelerating Diffusion Models for Free
Xinyin Ma (National University of Singapore) · Gongfan Fang (None) · Xinchao Wang (National University of Singapore)
HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention
Xiaolong Tang (Institute of Computing Technoloy, Chinese Academy of Sciences) · Meina Kan (Institute of Computing Technoloy, Chinese Academy of Sciences) · Shiguang Shan (Institute of Computing Technology, Chinese Academy of Sciences) · Zhilong Ji (Tomorrow Advancing Life) · Jinfeng Bai (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Xilin Chen (None)
CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection
Mikhail Kennerley (National University of Singapore) · Jian-Gang Wang (A*STAR) · Bharadwaj Veeravalli (NUS) · Robby T. Tan (National University of Singapore)
Neural Underwater Scene Representation
Yunkai Tang (Peking University) · Chengxuan Zhu (Peking University) · Renjie Wan (None) · Chao Xu (Peking University) · Boxin Shi (Peking University)
Scale Decoupled Distillation
Shicai Wei (University of Electronic Science and Technology of China) · Chunbo Luo (University of Electronic Science and Technology of China) · Yang Luo (University of Electronic Science and Technology of China)
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud (University of Texas at Austin) · Yapeng Tian (University of Texas at Dallas) · Diana Marculescu (The University of Texas at Austin)
PolarMatte: Fully Computational Ground-Truth-Quality Alpha Matte Extraction for Images and Video using Polarized Screen Matting
Kenji Enomoto (Adobe Systems) · TJ Rhodes (Adobe Research) · Brian Price (Adobe Research) · Gavin Miller (Adobe)
Traceable Federated Continual Learning
Qiang Wang (Beijing University of Posts and Telecommunications) · Bingyan Liu (None) · Yawen Li (Beijing University of Posts and Telecommunications)
CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection
Jiayi Zhu (East China Normal University) · Qing Guo (Institute of High Performance Computing, Singapore, A*STAR) · Felix Juefei Xu () · Yihao Huang (Nanyang Technological University) · Yang Liu (Nanyang Technological University) · Geguang Pu (East China Normal University)
CrossMAE: Cross Modality Masked Autoencoders For Region-Aware Audio-Visual Pre-Training
Yuxin Guo (Institute of Automation, Chinese Academy of Sciences) · Siyang Sun (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Shuailei Ma () · Kecheng Zheng (Ant Group) · Xiaoyi Bao (CASIA) · Shijie Ma (Institute of Automation, Chinese Academy of Sciences) · Wei Zou (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Yun Zheng (SUN YAT-SEN UNIVERSITY)
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
Gihyun Kwon (Korea Advanced Institute of Science & Technology) · Simon Jenni (Adobe Systems) · Ding Li (None) · Joon-Young Lee (Adobe Research) · Jong Chul Ye (Korea Advanced Institute of Science and Technology) · Fabian Caba Heilbron (Adobe Research)
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang (Zhejiang University) · Fan Ma (None) · Linchao Zhu (None) · Yingying Deng (None) · Yi Yang (Zhejiang University)
Vista-LLaMA: Reliable Video Teller via Equal Distance to Visual Tokens
Fan Ma (None) · Xiaojie Jin (ByteDance Inc./TikTok) · Heng Wang (Bytedance) · Yuchen Xian (Zhejiang University) · Jiashi Feng (ByteDance) · Yi Yang (Zhejiang University)
Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach
Wei Dong (Xi'an University of Architecture and Technology) · Xing Zhang (Xi'an University of Architecture and Technology) · Bihui Chen (Xi'an University of Architecture and Technology) · Dawei Yan (Xi'an University of Architecture and Technology) · Zhijun Lin (Northwest Polytechnical University Xi'an) · Qingsen Yan (Northwest Polytechnical University Xi'an) · Peng Wang (University of Wollonong) · Yang Yang (University of Electronic Science and Technology of China)
Real-World Mobile Image Denoising Dataset with Efficient Baselines
Roman Flepp (ETH Zurich) · Andrey Ignatov () · Radu Timofte (University of Würzburg) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.)
PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving
Xinshuo Weng (NVIDIA) · Boris Ivanovic (NVIDIA) · Yan Wang (NVIDIA) · Yue Wang (Massachusetts Institute of Technology) · Marco Pavone (NVIDIA)
SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling
Juhee Lee (Ewha Womans University) · Jewon Kang (Ewha Womans Univrsity)
Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
Vanessa Skliarova (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Egor Zakharov (ETH Zurich) · Otmar Hilliges (None) · Michael J. Black (University of Tübingen) · Justus Thies (Max-Planck Institute for Intelligent Systems)
MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading
Abdallah Dib (Ubisoft) · Luiz Gustavo Hafemann (Ubisoft La Forge) · Emeline Got (La Forge - Ubisoft ) · Trevor Anderson (Ubisoft) · Amin Fadaeinejad (None) · Rafael M. O. Cruz (École de technologie supérieure, Université du Québec) · Marc-André Carbonneau (Ubisoft)
Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay
Yuhang Zhou (Harbin Institute of Technology) · Zhongyun Hua (Harbin Institute of Technology Shenzhen)
Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension
Quan Liu (Shanghai Jiaotong University) · Hongzi Zhu (Shanghai Jiao Tong University) · Zhenxi Wang (Shanghai Jiaotong University) · Yunsong Zhou (Shanghai Jiao Tong University) · Shan Chang (Donghua University, Shanghai) · Minyi Guo (Shanghai Jiaotong University)
PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization
Yanlu Cai (Fudan University) · Weizhong Zhang (Fudan University) · Yuan Wu (Fudan University) · Cheng Jin (Fudan University)
UniHuman: A Unified Model For Editing Human Images in the Wild
Nannan Li (Boston University) · Qing Liu (Adobe Systems) · Krishna Kumar Singh (Adobe Systems) · Yilin Wang (Adobe Systems) · Jianming Zhang (Adobe Systems) · Bryan A. Plummer (None) · Zhe Lin (Adobe Research)
Learning to Select Views for Efficient Multi-View Understanding
Yunzhong Hou (Australian National University) · Stephen Gould (Australian National University) · Liang Zheng (Australian National University)
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
shiyu xuan (Peking University) · Qingpei Guo (Ant Group) · Ming Yang (Ant Group) · Shiliang Zhang (Peking University)
Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields
TIANQI LIU (None) · Xinyi Ye (School of Artificial Intelligence and Automation, Huazhong University of Science and Technology) · Min Shi (None) · Zihao Huang (None) · Zhiyu Pan (None) · Zhan Peng (None) · Zhiguo Cao ()
Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds
Yujia Liu (ETH Zürich) · Anton Obukhov (None) · Jan D. Wegner (University of Zurich) · Konrad Schindler (ETH Zurich)
Active Object Detection with Knowledge Aggregation and Distillation from Large Models
Dejie Yang (Peking University) · Yang Liu (Peking University)
ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations
Rwiddhi Chakraborty (None) · Adrian de Sena Sletten (University of Tromsø) · Michael C. Kampffmeyer (UiT The Arctic University of Norway)
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
Oded Bialer (General Motors) · Yuval Haitman (Ben-Gurion University of the Negev)
FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
Dongyeong Hwang (Korea Advanced Institute of Science & Technology) · Hyunju Kim (Korea Advanced Institute of Science & Technology) · Sunwoo Kim (Korea Advanced Institute of Science & Technology) · Kijung Shin (Korea Advanced Institute of Science and Technology)
Mip-Splatting: Alias-free 3D Gaussian Splatting
Zehao Yu (None) · Anpei Chen (Department of Computer Science, ETHZ - ETH Zurich) · Binbin Huang (ShanghaiTech University) · Torsten Sattler (Czech Technical University in Prague) · Andreas Geiger (University of Tübingen)
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
Guangyang Wu (Shanghai Jiaotong University) · Xiaohong Liu (Shanghai Jiao Tong University) · Jun Jia (Shanghai Jiaotong University) · Xuehao Cui (University of Michigan - Ann Arbor) · Guangtao Zhai (Shanghai Jiao Tong University)
UV-IDM: Identity-Conditioned Latent Diffusion Model for Face UV-Texture Generation
Hong Li (Beijing University of Aeronautics and Astronautics) · Yutang Feng (Beijing University of Aeronautics and Astronautics) · Song Xue (Baidu) · Xuhui Liu (Beihang University) · Boyu Liu (Beijing University of Aeronautics and Astronautics) · Bohan Zeng (Beijing University of Aeronautics and Astronautics) · Shanglin Li (Beijing University of Aeronautics and Astronautics) · Jianzhuang Liu (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences) · Shumin Han (Baidu) · Baochang Zhang (Beihang University)
UniPTS: A Unified Framework for Proficient Post-Training Sparsity
JingJing Xie (Xiamen University) · Yuxin Zhang (Xiamen University) · Mingbao Lin (Xiamen University) · ZhiHang Lin (Xiamen University) · Liujuan Cao (Xiamen University) · Rongrong Ji (Xiamen University)
PBWR: Parametric Building Wireframe Reconstruction from Aerial LiDAR Point Clouds
Shangfeng Huang (University of Calgary) · Ruisheng Wang (University of Calgary) · Bo Guo (Guangdong University of Technology) · Hongxin Yang (University of Calgary)
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen (Fudan University) · Xin Chen (University of Chinese Academy of Sciences, ShanghaiTech University) · Chi Zhang (Tencent ) · Mingsheng Li (Fudan University) · Gang Yu (Tencent) · Hao Fei (National University of Singapore) · Hongyuan Zhu (Institute for Infocomm Research) · Jiayuan Fan (Fudan University) · Tao Chen (Fudan University)
ProMark: Proactive Diffusion Watermarking for Causal Attribution
Vishal Asnani (Michigan State University) · John Collomosse (University of Surrey) · Tu Bui (Fujitsu Research and Development Center Co. Ltm.) · Xiaoming Liu (None) · Shruti Agarwal (None)
MMM: Generative Masked Motion Model
Ekkasit Pinyoanuntapong (University of North Carolina at Charlotte) · Pu Wang (University of North Carolina at Charlotte) · Minwoo Lee (University of North Carolina, Charlotte) · Chen Chen ()
Bridging the Gap Between End-to-End and Two-Step Text Spotting
Mingxin Huang (None) · Hongliang Li (South China University of Technology) · Yuliang Liu (Huazhong University of Science and Technology) · Xiang Bai (Huazhong University of Science and Technology) · Lianwen Jin (South China University of Technology)
GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction
Xiao Chen (The Chinese University of Hong Kong) · Quanyi Li (University of Edinburgh) · Tai Wang (Shanghai AI Laboratory) · Tianfan Xue (The Chinese University of Hong Kong) · Jiangmiao Pang (Shanghai AI Laboratory )
Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning
Fan Qi (Tianjin University of Technology) · Shuai Li (Tianjin University of Technology)
VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift
Leyuan Liu (Central China Normal University) · Yuhan Li (Central China Normal University) · Yunqi Gao (Central China Normal University) · Changxin Gao (Huazhong University of Science and Technology) · Yuanyuan Liu (None) · Jingying Chen (Central China Normal University)
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
Yifang Men (Alibaba Group) · Biwen Lei (Alibaba Group) · Yuan Yao (Alibaba group) · Miaomiao Cui (Alibaba Group) · Zhouhui Lian (Peking University) · Xuansong Xie (Alibaba Group)
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Xiaoxiao Long (The University of Hong Kong) · Yuan-Chen Guo (Tsinghua University) · Cheng Lin (Tencent) · Yuan Liu (The University of Hong Kong) · Zhiyang Dou (The University of Hong Kong) · Lingjie Liu (Saarland Informatics Campus, Max-Planck Institute) · Yuexin Ma (ShanghaiTech University) · Song-Hai Zhang (Tsinghua University, Tsinghua University) · Marc Habermann (Saarland Informatics Campus, Max-Planck Institute) · Christian Theobalt (MPI Informatik) · Wenping Wang (Texas A&M University - College Station)
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha (Kakao Brain) · Woo-Young Kang (Kakaobrain) · Jonghwan Mun (KakaoBrain) · Byungseok Roh (Kakao Brain)
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
Zhiqi Li (Nanjing University) · Zhiding Yu (NVIDIA) · Shiyi Lan (NVIDIA CORPORATION) · Jiahan Li (Nanjing University) · Jan Kautz (NVIDIA) · Tong Lu (Nanjing University) · Jose M. Alvarez (NVIDIA)
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Zaid Khan (Northeastern University) · Vijay Kumar BG (NEC Laboratories America) · Samuel Schulter (NEC Laboratories America) · Yun Fu (Northeastern University) · Manmohan Chandraker (UC San Diego)
MoMask: Generative Masked Modeling of 3D Human Motions
chuan guo (University of Alberta) · Yuxuan Mu (University of Alberta) · Muhammad Gohar Javed (University of Alberta) · Sen Wang (HoYoverse) · Li Cheng (University of Alberta)
Text2Loc: 3D Point Cloud Localization from Natural Language
Yan Xia (Technical University of Munich) · Letian Shi (Technische Universität München) · Zifeng Ding (LMU Munich) · João F. Henriques (University of Oxford) · Daniel Cremers (Technical University Munich)
Gaussian Shadow Casting for Neural Characters
Luis Bolanos (The University of British Columbia) · Shih-Yang Su (University of British Columbia) · Helge Rhodin (UBC)
SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers
Jonathan F. Carter (University of Oxford) · Joao Jorge (Oxehealth) · Oliver Gibson (Oxehealth Limited) · Lionel Tarassenko (University of Oxford)
Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences
Seungwook Kim (POSTECH) · Kejie Li (University of Oxford) · Xueqing Deng (ByteDance Research) · Yichun Shi (ByteDance) · Minsu Cho (POSTECH) · Peng Wang (Bytedance US AILab)
BigGait: Learning Gait Representation You Want by Large Vision Models
Dingqiang Ye (Southern University of Science and Technology) · Chao Fan (None) · Jingzhe Ma (Southern University of Science and Technology) · Xiaoming Liu (None) · Shiqi Yu (Southern University of Science and Technology)
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
Haochen Han (Xi'an Jiaotong University) · Qinghua Zheng (Xi'an Jiaotong University) · Guang Dai (SGIT AI) · Minnan Luo (None) · Jingdong Wang (Baidu)
Gaussian Shell Maps for Efficient 3D Human Generation
Rameen Abdal (Stanford University) · Wang Yifan (Stanford University) · Zifan Shi (HKUST) · Yinghao Xu (Chinese University of Hong Kong) · Ryan Po (Stanford University) · Zhengfei Kuang (Stanford University) · Qifeng Chen (Hong Kong University of Science and Technology) · Dit-Yan Yeung (Hong Kong University of Science and Technology) · Gordon Wetzstein (Stanford University)
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments
Liyuan Zhu (Stanford University) · Shengyu Huang (None) · Konrad Schindler (ETH Zurich) · Iro Armeni (Stanford University)
Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition
Kyle Buettner (None) · Sina Malakouti (University of Pittsburgh) · Xiang Li (University of Pittsburgh) · Adriana Kovashka (University of Pittsburgh)
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
Yuhao Sun (University of Science and Technology of China) · Lingyun Yu (University of Science and Technology of China) · Hongtao Xie (University of Science and Technology of China) · Jiaming Li (University of Science and Technology of China) · Yongdong Zhang (University of Science and Technology of China)
Loopy-SLAM: Dense Neural SLAM with Loop Closures
Lorenzo Liso (ETHZ - ETH Zurich) · Erik Sandström (ETH Zürich) · Vladimir Yugay (University of Amsterdam) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Martin R. Oswald (University of Amsterdam)
DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation
Xiaoliang Ju (The Chinese University of Hong Kong) · Zhaoyang Huang (The Chinese University of Hong Kong) · Yijin Li (Zhejiang University) · Guofeng Zhang (Zhejiang University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Hongsheng Li (The Chinese University of Hong Kong)
Feedback-Guided Autonomous Driving
Jimuyang Zhang (None) · Zanming Huang (Boston University, Boston University) · Arijit Ray (Boston University) · Eshed Ohn-Bar (Boston University, Boston University)
Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance
Yu (None) · Jie Huang (University of Science and Technology of China) · Li (None) · Kaiwen Zheng (University of Science and Technology of China) · Qi Zhu (University of Science and Technology of China) · Man Zhou (University of Science and Technology of China) · Feng Zhao (University of Science and Technology of China)
LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering
Jaehoon Choi (University of Maryland, College Park) · Rajvi Shah (Facebook) · Qinbo Li (Facebook) · Yipeng Wang (Meta Reality Labs) · Ayush Saraf (Meta Platforms, Inc.) · Changil Kim (Facebook) · Jia-Bin Huang (University of Maryland, College Park) · Dinesh Manocha (University of Maryland, College Park) · Suhib Alsisan (Meta) · Johannes Kopf (Facebook)
Test-Time Linear Out-of-Distribution Detection
Ke Fan (Fudan University) · Tong Liu (BOE Technology Group Co., Ltd) · Xingyu Qiu (Fudan University) · Yikai Wang (None) · Lian Huai (BOE Technology Group Co., Ltd) · Zeyu Shangguan (BOE TECHNOLOGY GROUP CO., LTD) · Shuang Gou (BOE Technology Group Co., Ltd) · FENGJIAN LIU (BOE Technology Group Co., Ltd ) · Yuqian Fu (Fudan University) · Yanwei Fu (Fudan University) · Xingqun Jiang (BOE Technology Group Co., LTD)
Matching Anything by Segmenting Anything
Siyuan Li (ETH Zurich) · Lei Ke (HKUST & ETH Zurich) · Martin Danelljan (ETH Zurich) · Luigi Piccinelli (ETH Zurich) · Mattia Segu (ETH Zurich - Swiss Federal Institute of Technology) · Luc Van Gool (ETH Zurich; KULeuven; INSAIT Sofia Un.) · Fisher Yu (ETH Zurich)
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Chengjian Feng (Meituan Inc.) · Yujie Zhong (Meituan Inc.) · Zequn Jie (Meituan) · Weidi Xie (Shanghai Jiaotong University) · Lin Ma (Meituan)
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang (None) · Sule Bai (Tsinghua University, Tsinghua University) · Guangyi Chen (MBZUAI, CMU) · Lei Chen (Beijing University of Science and Technology) · Jiwen Lu (Tsinghua University) · Junle Wang (Tencent) · Yansong Tang (Tsinghua University)
Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
Zihan Wang (None) · Siyang Song (University of Leicester) · Cheng Luo (Shenzhen University) · Songhe Deng (None) · Weicheng Xie (Shenzhen University) · Linlin Shen (None)
Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
Zijie Chen (Westlake University) · Lichao Zhang (Westlake University) · Fangsheng Weng (https://xinchenai.com/) · Lili Pan (University of Electronic Science and Technology of China) · ZHENZHONG Lan (None)
Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation
Sixian Zhang (None) · Xinyao Yu (University of the Chinese Academy of Sciences) · Xinhang Song (None) · XIAOHAN Wang (Xi'an Jiaotong University) · Shuqiang Jiang (Institute of Computing Technology, Chinese Academy of Sciences)
Multi-view Aggregation Network for Dichotomous Image Segmentation
Qian Yu (Dalian University of Technology) · Xiaoqi Zhao (Dalian University of Technology) · Youwei Pang (Dalian University of Technology) · Lihe Zhang (Dalian University of Technology) · Huchuan Lu (Dalian University of Technology)
EVCap: Retrieval-Augmented Image Captioning with External Visual--Name Memory for Open-World Comprehension
Jiaxuan Li (The University of Tokyo) · Duc Minh Vo (The University of Tokyo) · Akihiro Sugimoto (NII) · Hideki Nakayama (The University of Tokyo)
Plug-and-Play Diffusion Distillation
Yi-Ting Hsiao (University of Michigan - Ann Arbor) · Siavash Khodadadeh (Adobe Systems) · Kevin Duarte (Adobe Systems) · Wei-An Lin (Adobe Systems) · Hui Qu (Adobe Inc.) · Mingi Kwon (None) · Ratheesh Kalarot (Adobe Systems)
CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration
Fu-Zhao Ou (None) · Chongyi Li () · Shiqi Wang (City University of Hong Kong) · Sam Kwong (Lingnan University)
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
Zhihao Zhang (Xi'an Jiaotong University) · Shengcao Cao (University of Illinois at Urbana-Champaign) · Yu-Xiong Wang (None)
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada (Keio University) · Kanta Kaneda (Keio University) · Daichi Saito (Keio University) · Komei Sugiura (Keio University)
XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold
Guangyu Wang (Tsinghua University) · Jinzhi Zhang (Electronic Engineering, Tsinghua University, Tsinghua University) · Fan Wang (Alibaba Group) · Ruqi Huang (Tsinghua Shenzhen International Graduate School/Tsinghua Berkeley Shenzhen Institute ) · Lu Fang (Tsinghua University, Tsinghua University)
FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
Adrian Bulat (None) · Yassine Ouali (Samsung) · Georgios Tzimiropoulos (Queen Mary University London)
Differentiable Micro-Mesh Construction
Yishun Dou (Huawei) · Zhong Zheng (huawei.com) · Qiaoqiao Jin (Shanghai Jiao Tong University) · Rui Shi (Shanghai Jiao Tong University) · Yuhan Li (None) · Bingbing Ni (Shanghai Jiao Tong University)
CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement
Qiang Zhu (University of Electronic Science and Technology of China) · Jinhua Hao (Kuaishou Tech) · Yukang Ding (Kuaishou Tech) · Yu Liu (University of Electronic Science and Technology of China) · Qiao Mo (University of Electronic Science and Technology of China) · Ming Sun (Kuaishou Tech) · Chao Zhou (kuaishou) · Shuyuan Zhu (University of Electronic Science and Technology of China)
Enhancing Vision-Language Pretraining with Rich Supervisions
Yuan Gao (Computer Science Department, Stanford University) · Kunyu Shi (Amazon) · Pengkai Zhu (Boston University) · Edouard Belval (Amazon) · Oren Nuriel (Amazon) · Srikar Appalaraju (Amazon) · Shabnam Ghadar (Amazon) · Zhuowen Tu (University of California, San Diego) · Vijay Mahadevan (Amazon) · Stefano Soatto (AWS)
HOISDF: Constraining 3D Hand Object Pose Estimation with Global Signed Distance Fields
Haozhe Qi (EPFL - Switzerland) · Chen Zhao (EPFL) · Mathieu Salzmann (EPFL) · Alexander Mathis (None)
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
Xuanming Cui (University of Central Florida) · Alejandro Aparcedo (University of Central Florida) · Young Kyun Jang (Meta AI) · Ser-Nam Lim (Meta AI)
Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations
Daan de Geus (Eindhoven University of Technology) · Gijs Dubbelman (Eindhoven University of Technology)
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
Wei Zhang (University of Science and Technology of China) · Chaoqun Wan (Alibaba Group) · Tongliang Liu (Mohamed bin Zayed University of Artificial Intelligence) · Xinmei Tian (University of Science and Technology of China) · Xu Shen (Alibaba Group) · Jieping Ye (Alibaba Group)
Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring
Xin Gao (None) · Tianheng Qiu (University of Science and Technology of China) · Xinyu Zhang (None) · Hanlin Bai (China University of Mining Technology - Beijing) · Kang Liu (None) · xuan huang (Chinese Academy of Sciences) · Hu Wei (Chinese Academy of Sciences) · Guoying Zhang (China University of Mining Technology - Beijing) · Huaping Liu (Tsinghua University, Tsinghua University)
Countering Personalized Text-to-Image Generation with Influence Watermarks
Hanwen Liu (Peking University) · Zhicheng Sun (Peking University) · Yadong Mu (Peking University)
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields
Yunsong Wang (National University of Singapore) · Hanlin Chen (National University of Singapore) · Gim Hee Lee (National University of Singapore)
SNIDA: Unlocking Few-Shot Object Detection with Non-linear Semantic Decoupling Augmentation
Yanjie Wang (Huazhong University of Science and Technology) · Xu Zou (Huazhong University of Science and Technology) · Luxin Yan (Huazhong University of Science and Technology) · Sheng Zhong (Huazhong University of Science and Technology) · Jiahuan Zhou (Peking University)
Automatic Controllable Colorization via Imagination
Xiaoyan Cong (Zhejiang University) · Yue Wu (Huawei Technologies Ltd.) · Qifeng Chen (Hong Kong University of Science and Technology) · Chenyang Lei (The Hong Kong University of Science and Technology)
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
Yukang Cao (the University of Hong Kong) · Yan-Pei Cao (Tencent ARC Lab) · Kai Han (The University of Hong Kong) · Ying Shan (Tencent) · Kwan-Yee K. Wong (The University of Hong Kong)
Are Conventional SNNs Really Efficient? A Perspective from Network Quantization
Guobin Shen (None) · Dongcheng Zhao (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Tenglong Li (Institute of automation, Chinese Academy of Sciences) · Jindong Li (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Yi Zeng (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
Peng Xu (Zhejiang University) · Zhiyu Xiang (None) · Chengyu Qiao (Zhejiang University) · Jingyun Fu (Zhejiang University) · Tianyu Pu (Zhejiang University)
Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models
Shweta Mahajan (University of British Columbia) · Tanzila Rahman (University of British Columbia) · Kwang Moo Yi (University Of British Columbia) · Leonid Sigal (University Of British Columbia)
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships
Sebastian Koch (Ulm University / Bosch Center for AI) · Narunas Vaskevicius (Robert Bosch GmbH, Bosch) · Mirco Colosi (Robert Bosch GmbH) · Pedro Hermosilla (Technische Universität Wien) · Timo Ropinski (Ulm University)
DeconfuseTrack: Dealing with Confusion for Multi-Object Tracking
Cheng Huang (Huazhong University of Science and Technology) · Shoudong Han (Huazhong University of Science and Technology) · Mengyu He (Huazhong University of Science and Technology) · Wenbo Zheng (Huazhong University of Science and Technology) · Yuhao Wei (Huazhong University of Science and Technology)
PoseGPT: Chatting about 3D Human Pose
Yao Feng (None) · Jing Lin (Tsinghua University, Tsinghua University) · Sai Kumar Dwivedi (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Yu Sun (Harbin Institute of Technology) · Priyanka Patel (Max-Planck Institute) · Michael J. Black (University of Tübingen)
Improved Baselines with Visual Instruction Tuning
Haotian Liu (University of Wisconsin-Madison) · Chunyuan Li (Microsoft Research, Redmond) · Yuheng Li (University of Wisconsin - Madison) · Yong Jae Lee (Department of Computer Sciences, University of Wisconsin - Madison)
DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
Jiaxin Zhang (South China University of Technology) · Dezhi Peng (South China University of Technology) · Chongyu Liu (South China University of Technology) · Peirong Zhang (South China University of Technology) · Lianwen Jin (South China University of Technology)
MoST: Motion Style Transformer between Diverse Action Contents
Boeun Kim (Seoul National University) · Jungho Kim (KETI) · Hyung Jin Chang (University of Birmingham) · Jin Young Choi (Seoul National University)
Bilateral Propagation Network for Depth Completion
Jie Tang (National University of Defense Technology) · Fei-Peng Tian (Light Illusions) · Boshi An (Peking University) · Jian Li (National University of Defense Technology) · Ping Tan (Hong Kong University of Science and Technology)
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao (Purdue University) · Jiang Wang (Microsoft) · Ze Wang (Purdue University) · Zhengyuan Yang (Microsoft) · Lijuan Wang (Microsoft) · Qiang Qiu (Purdue University) · Zicheng Liu (Microsoft)
Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation
Lior Talker (Samsung R&D Israel) · Aviad Cohen (Samsung) · Erez Yosef (Tel Aviv University) · Alexandra Dana (Samsung) · Michael Dinerstein (Samsung)
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang (The Chinese University of Hong Kong) · Li Chen (The University of Hong Kong) · Yanan Sun (The Hong Kong University of Science and Technology) · Hongyang Li (Shanghai AI Lab)
On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving
Kaituo Feng (Beijing Institute of Technology) · Changsheng Li (None) · Dongchun Ren (ALLRIDE.AI) · Ye Yuan (Beijing Institute of Technology) · Guoren Wang (Beijing Institute of Technology)
NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models
Yusuf Dalva (Virginia Polytechnic Institute and State University) · Pinar Yanardag (Virginia Polytechnic Institute and State University)
Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion
Hao Ai (The Hong Kong University of Science and Technology (Guangzhou Campus)) · Lin Wang (Hong Kong University of Science and Technology)
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
Nikita Starodubcev (Yandex) · Dmitry Baranchuk (Higher School of Economics) · Artem Fedorov (Moscow Institute of Physics and Technology) · Artem Babenko (Yandex)
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
Fangfu Liu (Tsinghua University) · Diankun Wu (Tsinghua University, Tsinghua University) · Yi Wei (None) · Yongming Rao (Tsinghua University) · Yueqi Duan (None)
Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts
Fei Ni (Tianjin University) · Jianye Hao (Tianjin University) · Shiguang Wu (Huawei Technologies Ltd.) · Longxin Kou (None) · Jiashun Liu (Tianjin University) · YAN ZHENG (Tianjin Unibersity, China) · Bin Wang (Huawei Noah's Ark Lab) · Yuzheng Zhuang (Huawei Technologies Ltd.)
Improving Distant 3D Object Detection Using 2D Box Supervision
Zetong Yang (The Chinese University of Hong Kong) · Zhiding Yu (NVIDIA) · Christopher Choy (Stanford University) · Renhao Wang (University of California, Berkeley) · Anima Anandkumar (California Institute of Technology) · Jose M. Alvarez (NVIDIA)
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
Angchi Xu (SUN YAT-SEN UNIVERSITY) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY)
Infrared Small Target Detection with Scale and Location Sensitivity
Qiankun Liu (Beijing Institute of Technology) · Rui Liu (Beijing Institute of Technology) · Bolun Zheng (Hangzhou Dianzi University) · Hongkui Wang (Hangzhou Dianzi University) · Ying Fu (None)
Minimal Perspective Autocalibration
Andrea Porfiri Dal Cin (Polytechnic Institute of Milan) · Timothy Duff (University of Washington) · Luca Magri (Polytechnic Institute of Milan) · Tomas Pajdla (CIIRC - Czech Technical University in Prague)
SVGDreamer: Text Guided SVG Generation with Diffusion Model
XiMing Xing (Beihang University) · Chuang Wang (Beihang University) · Haitao Zhou (Beihang University) · Jing Zhang (Beihang University) · Dong Xu (University of Hong Kong) · Qian Yu (Beihang University)
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
xin zhang (xidian university) · Jiawei Du (Centre for Frontier AI Research (CFAR), A*STAR, Singapore) · Weiying Xie (None) · Yunsong Li () · Joey Tianyi Zhou (National University of Singapore )
GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo
Jiang Wu (Northwest Polytechnical University Xi'an) · Rui Li (None) · Haofei Xu (ETH Zurich) · Wenxun Zhao (None) · Yu Zhu (Northwest Polytechnical University Xi'an) · Jinqiu Sun (Northwest Polytechnical University Xi'an) · Yanning Zhang (Northwestern Polytechnical University)
Paint3D: Paint Anything 3D with Lighting-less Texture Diffusion Models
Xianfang Zeng (Tencent PCG) · Xin Chen (University of Chinese Academy of Sciences, ShanghaiTech University) · Zhongqi Qi (Tencent PCG) · Wen Liu (Tencent PCG) · Zibo Zhao (None) · Zhibin Wang (Tencent LightAI Lab) · Bin Fu (Tencent) · Yong Liu (Zhejiang University) · Gang Yu (Tencent)
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
Dongyoung Kim (Yonsei Universtiy) · Jinwoo Kim (Yonsei University) · Junsang Yu (Samsung) · Seon Joo Kim (Yonsei University)
Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption
Buzhen Huang (None) · Chen Li (National University of Singapore) · Chongyang Xu (Sichuan University) · Liang Pan (Shanghai AI Laboratory) · Yangang Wang (Southeast University) · Gim Hee Lee (National University of Singapore)
VRetouchEr: Learning Cross-frame Feature Interdependence with Imperfection Flow for Face Retouching in Videos
Wen Xue (South China University of Technology) · Le Jiang (South China University of Technology) · Lianxin Xie (South China University of Technology) · Si Wu (South China University of Technology) · Yong Xu (Peng Cheng Laboratory) · Hau San Wong (City University of Hong Kong)
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma (University of Electronic Science and Technology of China) · Guanchao Qiao (University of Electronic Science and Technology of China) · Yian Liu (University of Electronic Science and Technology of China) · Liwei Meng (University of Electronic Science and Technology of China) · Ning Ning (University of Electronic Science and Technology of China) · Yang Liu (University of Electronic Science and Technology of China) · Shaogang Hu (University of Electronic Science and Technology of China)
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing
Boqiang Zhang (None) · Hongtao Xie (University of Science and Technology of China) · Zuan Gao (University of Science and Technology of China) · Yuxin Wang (University of Science and Technology of China)
Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation
Xianghui Xie (University of Tübingen) · Bharat Lal Bhatnagar (Meta) · Jan Lenssen (Saarland Informatics Campus, Max-Planck Institute) · Gerard Pons-Moll (University of Tübingen)
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Yutong Feng (Alibaba Group) · Biao Gong (Alibaba Group) · Di Chen (Alibaba Group) · Yujun Shen (The Chinese University of Hong Kong) · Yu Liu (Alibaba Group) · Jingren Zhou (Alibaba Group)
GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point Clouds
Shengjun Zhang (Tsinghua University, Tsinghua University) · Xin Fei (Tsinghua University, Tsinghua University) · Yueqi Duan (None)
SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis
Teng Hu () · Ran Yi (Shanghai Jiao Tong University) · Baihong Qian (Shanghai Jiaotong University) · Jiangning Zhang (Tencent Youtu Lab) · Paul L. Rosin (Cardiff University) · Yu-Kun Lai (Cardiff University)
Video ReCap: Recursive Captioning of Hour-Long Videos
Md Mohaiminul Islam (UNC Chapel Hill) · Vu Bao Ngan Ho (University of North Carolina at Chapel Hill) · Xitong Yang (Meta) · Tushar Nagarajan (Meta) · Lorenzo Torresani (Facebook) · Gedas Bertasius (UNC Chapel Hill)
Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention, Alignment and Prompt Tuning
Leslie Ching Ow Tiong (Samsung Electronics) · Dick Sigmund (AIDOT Inc.) · Chen-Hui Chan (Korea Institute of Science and Technology) · Andrew Beng Jin Teoh (Yonsei University)
G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
Yufei Ye (Carnegie Mellon University) · Abhinav Gupta (Carnegie Mellon University) · Kris Kitani (Carnegie Mellon University) · Shubham Tulsiani (Carnegie Mellon University)
MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
Hanzhe Hu (Carnegie Mellon University) · Zhizhuo Zhou (Stanford University) · Varun Jampani (Google Research) · Shubham Tulsiani (Carnegie Mellon University)
IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation
Mengshun Hu (None) · Kui Jiang (Harbin Institute of Technology) · Zhihang Zhong (Shanghai AI Lab) · Zheng Wang (Wuhan University) · Yinqiang Zheng (None)
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
Anqi Zhu (None) · Qiuhong Ke (Monash University) · Mingming Gong (University of Melbourne) · James Bailey (The University of Melbourne)
Semantic-aware SAM for Point-Prompted Instance Segmentation
Zhaoyang Wei (University of the Chinese Academy of Sciences) · Pengfei Chen (University of the Chinese Academy of Sciences) · Xuehui Yu (None) · Guorong Li (University of Chinese Academy of Sciences) · Jianbin Jiao () · Zhenjun Han (University of the Chinese Academy of Sciences)
CoGS: Controllable Gaussian Splatting
Heng Yu (Carnegie Mellon University) · Joel Julin (Carnegie Mellon University) · Zoltán Á. Milacski (Carnegie Mellon University) · Koichiro Niinuma (Fujitsu Research of America) · László A. Jeni (Carnegie Mellon University)
A Bayesian Approach to OOD Robustness in Image Classification
Prakhar Kaushik (Johns Hopkins University) · Adam Kortylewski (University of Freiburg & MPI-INF) · Alan L. Yuille (Johns Hopkins University)
Multimodal Sense-Informed Prediction of 3D Human Motions
Zhenyu Lou (None) · Qiongjie Cui (Nanjing University of Science and Technology) · Haofan Wang (Xiaohongshu) · Xu Tang (Shanghaitech University) · Hong Zhou (Zhejiang University)
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu (None) · Christopher Clark (None) · Sangho Lee (Allen Institute for Artificial Intelligence) · Zichen Zhang (Allen Institute for Artificial Intelligence) · Savya Khosla (University of Illinois Urbana-Champaign) · Ryan Marten (None) · Derek Hoiem (University of Illinois at Urbana-Champaign) · Aniruddha Kembhavi (Allen Institute for Artificial Intelligence)
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv (None) · Hong Chen (Beijing University of Aeronautics and Astronautics) · Jinyang Guo (Beijing University of Aeronautics and Astronautics) · Yifu Ding (None) · Xianglong Liu (BUAA)
Leveraging Predicate and Triplet Learning for Scene Graph Generation
Jiankai Li (Beihang University) · Yunhong Wang (Beihang University) · Xiefan Guo (Beihang University) · Ruijie Yang (Beihang University) · Weixin Li (None)
Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment
Alvi Md Ishmam (Virginia Polytechnic Institute and State University) · Chris Thomas (Virginia Polytechnic Institute and State University)
PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors
Haley So (Stanford University) · Laurie Bose (None) · Piotr Dudek (University of Manchester) · Gordon Wetzstein (Stanford University)
Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning
Xialei Liu (Nankai University) · Jiang-Tian Zhai (Nankai University) · Andrew Bagdanov (Università degli Studi di Firenze) · Ke Li (Tencent) · Ming-Ming Cheng (Nankai University, Tsinghua University)
Action Detection via an Image Diffusion Process
Lin Geng Foo (Singapore University of Technology and Design) · Tianjiao Li (Singapore University of Technology and Design) · Hossein Rahmani (Lancaster University) · Jun Liu (Singapore University of Technology and Design (SUTD))
Disentangled Prompt Representation for Domain Generalization
De Cheng (Xidian University) · Zhipeng Xu (Xi'an University of Electronic Science and Technology) · XINYANG JIANG (Microsoft Research) · Nannan Wang (Xidian University) · Dongsheng Li (Microsoft Research Asia) · Xinbo Gao (Chongqing University of Post and Telecommunications)
UniMODE: Unified Monocular 3D Object Detection
Zhuoling Li (University of Hong Kong) · Xiaogang Xu (Zhejiang Lab) · Ser-Nam Lim (Meta AI) · Hengshuang Zhao (The University of Hong Kong)
PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
Jingbo Wang (Shanghai AI LAB) · Zhengyi Luo (Carnegie Mellon University) · Ye Yuan (NVIDIA Research) · Yixuan LI (The Chinese University of Hong Kong) · Bo Dai (Shanghai AI Laboratory)
SAOR: Single-View Articulated Object Reconstruction
Mehmet Aygun (University of Edinburgh) · Oisin Mac Aodha (University of Edinburgh)
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
Tomas Soucek (Czech Technical University of Prague) · Dima Damen (University of Bristol and Google DeepMind) · Michael Wray (University of Bristol) · Ivan Laptev (INRIA Paris) · Josef Sivic (Czech Technical University in Prague)
TULIP: Transformer for Upsampling of LiDAR Point Cloud
Bin Yang (ETHZ - ETH Zurich) · Patrick Pfreundschuh (ETHZ - ETH Zurich) · Roland Siegwart (Swiss Federal Institute of Technology) · Marco Hutter (ETHZ - ETH Zurich) · Peyman Moghadam (None) · Vaishakh Patil (ETHZ - ETH Zurich)
Incremental Residual Concept Bottleneck Models
Chenming Shang (Tsinghua University) · Shiji Zhou (Tsinghua University, Tsinghua University) · Hengyuan Zhang (Tsinghua University, Tsinghua University) · Xinzhe Ni (Tsinghua University) · Yujiu Yang (Tsinghua University) · Yuwang Wang (Tsinghua University, Tsinghua University)
Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement
Han Wu (Sun Yat-sen University) · Guanyan Ou (Sun Yat-sen University) · Weibin Wu (SUN YAT-SEN UNIVERSITY) · Zibin Zheng (Sun Yat-sen University)
Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Jin-Chuan Shi (Beihang University) · Miao Wang (Beihang University) · Haobin Duan (Beihang University) · Shaohua Guan (None)
Efficient Dataset Distillation via Minimax Diffusion
Jianyang Gu (Zhejiang University) · Saeed Vahidian (Duke University) · Vyacheslav Kungurtsev (Czech Technical Univeresity in Prague, Czech Technical University of Prague) · Haonan Wang (national university of singaore, National University of Singapore) · Wei Jiang (Zhejiang University) · Yang You (National University of Singapore) · Yiran Chen (Duke University)
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao (Xiamen University) · Yue Yang (Shanghai Jiaotong University) · Kaipeng Zhang (Shanghai AI Laboratory) · Wenqi Shao (The Chinese University of Hong Kong) · Yuxin Zhang (Xiamen University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory) · Ping Luo (The University of Hong Kong) · Rongrong Ji (Xiamen University)
Density-Adaptive Model Based on Motif Matrix for Multi-Agent Trajectory Prediction
Di Wen (None) · Haoran Xu () · Zhaocheng He (SUN YAT-SEN UNIVERSITY) · Zhe Wu (Pengcheng Laboratory) · Guang Tan (Sun Yat-sen University) · Peixi Peng (Peking University)
Towards Accurate Post-training Quantization for Diffusion Models
Changyuan Wang (Tsinghua University) · Ziwei Wang (Tsinghua University, Tsinghua University) · Xiuwei Xu (Tsinghua University, Tsinghua University) · Yansong Tang (Tsinghua University) · Jie Zhou (None) · Jiwen Lu (Tsinghua University)
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Chi Yan (Shanghai AI Laboratory) · Delin Qu (Fudan University) · Dong Wang (Shanghai AI Laboratory) · Dan Xu (Department of Computer Science and Engineering, The Hong Kong University of Science and Technology) · Zhigang Wang (Shanghai AI Lab) · Bin Zhao (Northwest Polytechnical University Xi'an) · Xuelong Li (Northwestern Polytechnical University)
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Xiangheng Shan (Huazhong University of Science and Technology) · Dongyue Wu (None) · Guilin Zhu (Huazhong University of Science and Technology) · Yuanjie Shao (Huazhong University of Science and Technology) · Nong Sang (Huazhong University of Science and Technology) · Changxin Gao (Huazhong University of Science and Technology)
View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network
Quan Zhang (SUN YAT-SEN UNIVERSITY) · Lei Wang (SUN YAT-SEN UNIVERSITY) · Vishal M. Patel (Johns Hopkins University) · Xiaohua Xie (SUN YAT-SEN UNIVERSITY) · Jianhuang Lai (SUN YAT-SEN UNIVERSITY)
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang (The University of Tokyo) · Guo Chen (Nanjing University) · Jilan Xu (None) · Mingfang Zhang (None) · Lijin Yang (The University of Tokyo) · Baoqi Pei (Zhejiang University) · Hongjie Zhang (Shanghai Artificial Intelligence Laboratory) · Lu Dong (University of Science and Technology of China) · Yali Wang (SIAT, Chinese Academy of Sciences) · Limin Wang (Nanjing University) · Yu Qiao (Shanghai Aritifcal Intelligence Laboratory)
DUSt3R: Geometric 3D Vision Made Easy
Shuzhe Wang (Aalto University) · Vincent Leroy (Naver Labs Europe) · Yohann Cabon (Naver Labs Europe) · Boris Chidlovskii (Naver Labs Europe) · Jerome Revaud (Naver Labs Europe)
InceptionNeXt: When Inception Meets ConvNeXt
Weihao Yu (National University of Singapore) · Pan Zhou (Sea Group) · Shuicheng Yan (National University of Singapore, Department of Electrical and Computer Engineering) · Xinchao Wang (National University of Singapore)
MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
Zeren Jiang (ETHZ - ETH Zurich) · Chen Guo (ETH Zurich) · Manuel Kaufmann (ETH Zurich) · Tianjian Jiang (None) · Julien Valentin (Microsoft) · Otmar Hilliges (None) · Jie Song (ETHZ - ETH Zurich)
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
Rohan Sarkar (Purdue University) · Avinash Kak (Purdue University)
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan (SUN YAT-SEN UNIVERSITY) · Jianhuang Lai (SUN YAT-SEN UNIVERSITY) · Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY) · Jian-Fang Hu (SUN YAT-SEN UNIVERSITY)
Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation
Haojie Zhang (South China University of Technology) · Yongyi Su (South China University of Technology) · Xun Xu (A*STAR) · Kui Jia (South China University of Technology)
TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process
Zhiyuan Ren (Michigan State University) · Minchul Kim (Michigan State University) · Feng Liu (Michigan State University) · Xiaoming Liu (None)
MLP Can Be A Good Transformer Learner
Sihao Lin (Royal Melbourne Institute of Technology) · Pumeng Lyu (Shanghai AI Laboratory) · Dongrui Liu (None) · Tao Tang (SYSU) · Xiaodan Liang (Sun Yat-sen University) · Andy Song (Royal Melbourne Institute of Technology) · Xiaojun Chang (University of Technology Sydney)
Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification
Zhenyu Cui (Peking University) · Jiahuan Zhou (Peking University) · Xun Wang (ByteDance Inc) · Manyu Zhu (bytedance) · Yuxin Peng (Peking University)
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
Jing Shi (Adobe Systems) · Wei Xiong (Adobe Systems) · Zhe Lin (Adobe Research) · HyunJoon Jung (Adobe Systems)
Towards a Perceptual Evaluation Framework for Lighting Estimation
Justine Giroux (Université Laval) · Mohammad Reza Karimi Dastjerdi (None) · Yannick Hold-Geoffroy (Adobe Research) · Javier Vazquez-Corral (Computer Vision Center / Autonomous University of Barcelona) · Jean-François Lalonde (Université Laval)
RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
Hongchi Xia (Shanghai Jiaotong University) · Yang Fu (University of California San Diego) · Sifei Liu (NVIDIA) · Xiaolong Wang (UCSD)
Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen (Tencent) · Chaoyou Fu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Peixian Chen (Xiamen University) · Mengdan Zhang (Tencent Youtu Lab) · Ke Li (Tencent) · Xing Sun (Tencent YouTu Lab) · Yunsheng Wu (Tencent YouTu Lab) · Shaohui Lin (East China Normal University) · Rongrong Ji (Xiamen University)
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
Zixuan Wang (None) · Jia Jia (Department of Computer Science and Technology, Tsinghua University) · Shikun Sun (Tsinghua University, Tsinghua University) · Haozhe Wu (Tsinghua University, Tsinghua University) · Rong Han (Tsinghua University, Tsinghua University) · Zhenyu Li (Tsinghua University, Tsinghua University) · Di Tang (ByteDance) · Jiaqing Zhou (bytedance) · Jiebo Luo (University of Rochester)
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Hanwen Jiang (University of Texas at Austin) · Arjun Karpur (Google Research) · Bingyi Cao (Google Research) · Qixing Huang (University of Texas at Austin) · André Araujo (Google Research)
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
Linfeng Yuan (Nanjing University of Science and Technology) · Miaojing Shi (King's College London) · Zijie Yue (Tongji University) · Qijun Chen (Tongji University)
Diffusion-FOF: Single-view Clothed Human Reconstruction via Diffusion-based Fourier Occupancy Field
Yuanzhen Li (Wuhan University) · Fei LUO (Wuhan University) · Chunxia Xiao (Wuhan University)
Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering
Chen Zhang (Sensetime) · Wencheng Han (University of Macau) · Yang Zhou (Sensetime Group) · Jianbing Shen (University of Macau) · Cheng-Zhong Xu (University of Macau) · Wentao Liu (Sensetime)
Joint2Human: High-quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
Muxin Zhang (Tianjin University) · Qiao Feng (None) · Zhuo Su (ByteDance) · Chao Wen (ByteDance) · Zhou Xue (Li Auto) · Kun Li (None)
Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
Yunan Zeng (None) · Yan Huang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · Jinjin Zhang (Meituan) · Zequn Jie (Meituan) · Zhenhua Chai (Meituan) · Liang Wang (CASIA)
Relightful Harmonization: Lighting-aware Portrait Background Replacement
Mengwei Ren () · Wei Xiong (Adobe Systems) · Jae Shin Yoon (Adobe Systems) · Zhixin Shu (Adobe Systems) · Jianming Zhang (Adobe Systems) · HyunJoon Jung (Adobe Systems) · Guido Gerig (New York University) · He Zhang (Adobe Systems)
eTraM: Event-based Traffic Monitoring Dataset
Aayush Atul Verma (Arizona State University) · Bharatesh Chakravarthi (Arizona State University) · Arpitsinh Vaghela (None) · Hua Wei (Arizona State University) · 'YZ' Yezhou Yang (Arizona State University)
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Mark Hamilton (Massachusetts Institute of Technology) · Andrew Zisserman (University of Oxford) · John Hershey (Google) · William Freeman (MIT and Google)
FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion
George Cazenavette (Massachusetts Institute of Technology) · Avneesh Sud (Google) · Thomas Leung (Google Inc) · Ben Usman (Google Research)
Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen (Tencent AI Lab) · Yong Zhang (Tencent AI Lab) · Xiaodong Cun (Tencent AI Lab) · Menghan Xia (Tencent AI Lab) · Xintao Wang (Tencent) · CHAO WENG (Tencent AI Lab) · Ying Shan (Tencent)
TextNeRF: A Novel Scene-Text Image Synthesis Method based on Neural Radiance Fields
Jialei Cui (Peking University) · Jianwei Du (Southeast University) · Wenzhuo Liu (China University of Mining Technology - Beijing) · Zhouhui Lian (Peking University)
Accept the Modality Gap: An Exploration in the Hyperbolic Space
Sameera Ramasinghe (Amazon) · Violetta Shevchenko (Amazon) · Gil Avraham (Amazon) · Thalaiyasingam Ajanthan (Amazon)
MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection
Haowen Sun (Tsinghua University, Tsinghua University) · Yueqi Duan (None) · Juncheng Yan (None) · Yifan Liu (Tsinghua University, Tsinghua University) · Jiwen Lu (Tsinghua University)
GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence
Van Nguyen Nguyen (Ecole des Ponts ParisTech) · Thibault Groueix (Adobe Systems) · Mathieu Salzmann (EPFL) · Vincent Lepetit (Ecole des Ponts ParisTech)
6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation
Li Xu (Singapore University of Technology and Design) · Haoxuan Qu (Singapore University of Technology and Design) · Yujun Cai (Meta) · Jun Liu (Singapore University of Technology and Design (SUTD))
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
Xiuquan Hou (Xi'an Jiaotong University) · Meiqin Liu (None) · Senlin Zhang (Zhejiang University) · Ping Wei (None) · Badong Chen (Xi'an Jiaotong University)
Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts
Dominik Scheuble (Universität des Saarlandes) · Chenyang Lei (The Hong Kong University of Science and Technology) · Mario Bijelic (Princeton University) · Seung-Hwan Baek (POSTECH) · Felix Heide (Department of Computer Science, Princeton University)
Multi-Attribute Interactions Matter for 3D Visual Grounding
Can Xu (Nanjing University of Science and Technology) · Yuehui Han (Nanjing University of Science and Technology) · Rui Xu (Nanjing University Of Science And Technology) · Le Hui (Nanjing University Of Science And Technology) · Jin Xie (Department of Computer Science, Nanjing University of Science and Technology) · Jian Yang (Nanjing University of Science and Technology)
Bootstrapping Autonomous Driving Radars with Self-Supervised Learning
Yiduo Hao (University of Cambridge) · Sohrab Madani (UIUC) · Junfeng Guan (EPFL - EPF Lausanne) · Mo Alloulah (RadarEye) · Saurabh Gupta (University of Illinois, Urbana Champaign) · Haitham Al Hassanieh (University of Illinois at Urbana-Champaign)
CAD: Photorealistic 3D Generation via Adversarial Distillation
Ziyu Wan (City University of Hong Kong) · Despoina Paschalidou (Stanford) · Ian Huang (Computer Science Department, Stanford University) · Hongyu Liu (Hong Kong University of Science and Technology) · Bokui Shen (Stanford University) · Xiaoyu Xiang (Meta) · Jing Liao (City University of Hong Kong) · Leonidas Guibas (Stanford University)
DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking
Fei Xie (None) · Zhongdao Wang (Huawei Technologies Ltd.) · Chao Ma (Shanghai Jiao Tong University)
SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency
Paul Roetzer (University of Bonn) · Florian Bernard (University of Bonn)
Towards Better Vision-Inspired Vision-Language Models
Yun-Hao Cao (Nanjing University) · Kaixiang Ji (Ant Group) · Ziyuan Huang (National University of Singapore) · Chuanyang Zheng (Ant Group) · Jiajia Liu (Alibaba Group) · Jian Wang (, Institute of automation, Chinese academy of science) · Jingdong Chen (Ant Group) · Ming Yang (Ant Group)
Gated Fields: Learning Scene Reconstruction from Gated Videos
Andrea Ramazzina (Saarland University, Universität des Saarlandes) · Stefanie Walz (Mercedes-Benz AG) · Pragyan Dahal (Polytechnic Institute of Milan) · Mario Bijelic (Princeton University) · Felix Heide (Department of Computer Science, Princeton University)
Generative Quanta Color Imaging
Vishal Purohit (Purdue University) · Junjie Luo (Purdue University) · Yiheng Chi (Purdue University) · Qi Guo (Purdue University) · Stanley H. Chan (Purdue University, USA) · Qiang Qiu (Purdue University)
Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models
Zijin Yang (None) · Kai Zeng (University of Science and Technology of China) · Kejiang Chen (University of Science and Technology of China) · Han Fang (National University of Singapore) · Weiming Zhang (University of Science and Technology of China) · Nenghai Yu (University of Science and Technology of China)
Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation
Yunhe Gao (Rutgers University)
ParamISP: Learned Forward and Inverse ISPs using Camera Parameters
Woohyeok Kim (POSTECH) · Geonu Kim (POSTECH) · Junyong Lee (None) · Seungyong Lee (POSTECH) · Seung-Hwan Baek (POSTECH) · Sunghyun Cho (POSTECH)
Structured Model Probing: Empowering Efficient Transfer Learning by Structured Regularization
Zhi-Fan Wu (Alibaba DAMO Academy) · Chaojie Mao (Alibaba Group) · Xue Wang (Pennsylvania State University) · Jianwen Jiang (Alibaba DAMO Academy) · Yiliang Lv (Gientech AIL) · Rong Jin (Twitter)
Instance-aware Contrastive Learning for Occluded Human Mesh Reconstruction
Mi-Gyeong Gwon (Konkuk University) · Gi-Mun Um (Electronics and Telecommucations Research Institute) · Won-Sik Cheong (Electronics and Telecommunications Research Institute) · Wonjun Kim (Konkuk University)
SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
Lizhe Liu (Xiaomi) · Bohua Wang (Xi'an Jiaotong University) · Hongwei Xie (Xiaom EV) · Daqi Liu (Xiaomi) · Li Liu (None) · Kuiyuan Yang (DeepMotion) · Bing Wang (Alibaba Group) · Zhiqiang Tian (Xi'an Jiaotong University)
WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion
Khiem Vuong (Carnegie Mellon University) · N. Dinesh Reddy (Carnegie Mellon University) · Robert Tamburo (Carnegie Mellon University) · Srinivasa G. Narasimhan (Carnegie Mellon University)
Data Valuation and Detections in Federated Learning
Wenqian Li (National University of Singapore) · Shuran Fu (National University of Singapore) · Fengrui Zhang (Rutgers University) · Yan Pang (National University of Singapore)
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
Ben Agro (Waabi) · Quinlan Sykora (Waabi) · Sergio Casas (Waabi) · Thomas Gilles (Waabi) · Raquel Urtasun (Waabi)
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
Jaehyeok Shim (UNIST) · Kyungdon Joo (None)
Unveiling the Unknown: Unleashing the Power of Unknown to Known in Open-Set Source-Free Domain Adaptation
Fuli Wan (Xi'an University of Electronic Science and Technology) · Han Zhao (Xidian University) · Xu Yang (Xi'an University of Electronic Science and Technology) · Cheng Deng (Xidian University)
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han (University of Oxford, University of Oxford) · Max Bain (VGG, University of Oxford) · Arsha Nagrani (Google ) · Gül Varol (Ecole des Ponts ParisTech) · Weidi Xie (Shanghai Jiaotong University) · Andrew Zisserman (University of Oxford)
Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner
Mengfei Xia (Tsinghua University, Tsinghua University) · Yujun Shen (The Chinese University of Hong Kong) · Changsong Lei (Tsinghua University, Tsinghua University) · Yu Zhou (Tsinghua University, Tsinghua University) · Deli Zhao (Alibaba Group) · Ran Yi (Shanghai Jiao Tong University) · Wenping Wang (Texas A&M University - College Station) · Yong-Jin Liu (Tsinghua University, Tsinghua University)
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder (UT Austin & Meta AI) · Ziad Al-Halah (University of Utah) · Kristen Grauman (University of Texas at Austin)
Diversity-aware Channel Pruning for StyleGAN Compression
Jiwoo Chung (Sungkyunkwan University) · Sangeek Hyun (Sungkyunkwan University) · Sang-Heon Shim (Sungkyunkwan University) · Jae-Pil Heo (Sungkyunkwan University)
SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models
Feifei Wang (University of Science and Technology of China) · Zhentao Tan (Alibaba DAMO Academy; University of Science and Technology of China) · Tianyi Wei (None) · Yue Wu (Alibaba Group) · Qidong Huang (University of Science and Technology of China)
RobustSAM: Segment Anything Robustly on Degraded Images
Wei-Ting Chen (National Taiwan University) · Yu Jiet Vong (National Taiwan University) · Sy-Yen Kuo (National Taiwan University) · Sizhuo Ma (Snap Inc.) · Jian Wang (Snap Inc.)
Learned Trajectory Embedding for Subspace Clustering
Yaroslava Lochman (Chalmers University of Technology) · Christopher Zach (Chalmers University) · Carl Olsson (Lund University)
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Xian Liu (The Chinese University of Hong Kong) · Xiaohang Zhan (Tencent) · Jiaxiang Tang (Baidu) · Ying Shan (Tencent) · Gang Zeng (Peking University) · Dahua Lin (The Chinese University of Hong Kong) · Xihui Liu (The University of Hong Kong) · Ziwei Liu (Nanyang Technological University)
Rethinking Inductive Biases for Surface Normal Estimation
Gwangbin Bae (Imperial College London) · Andrew J. Davison (Imperial College London)
Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction
Xiaoyang Lyu (University of Hong Kong) · Chirui Chang (None) · Peng Dai (None) · Yangtian Sun (None) · Xiaojuan Qi (University of Oxford)
Dynamic Prompt Optimizing for Text-to-Image Generation
Wenyi Mo (Renmin University of China) · Tianyu Zhang (Du Xiaoman Financial) · Yalong Bai (JD AI Research) · Bing Su (None) · Ji-Rong Wen (Renmin University of China) · Qing Yang (Du Xiaoman Technology(BeiJing))
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di (Shanghai Jiao Tong University) · Weidi Xie (Shanghai Jiaotong University)
Learning Inclusion Matching for Animation Paint Bucket Colorization
Yuekun Dai (Nanyang Technological University) · Shangchen Zhou (Nanyang Technological University) · Blake Li (Nanyang Technological University) · Chongyi Li () · Chen Change Loy (NANYANG TECHNOLOGICAL UNIVERSITY)
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
Yixuan Zhu (Tsinghua University) · Ao Li (Tsinghua University) · Yansong Tang (Tsinghua University) · Wenliang Zhao (Automation, Tsinghua University, Tsinghua University) · Jie Zhou (None) · Jiwen Lu (Tsinghua University)
Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery
Siddharth Tourani (MBZUAI) · Ahmed Alwheibi (Mohamed bin Zayed University of Artificial Intelligence) · Arif Mahmood (Information Technology University, Lahore) · Muhammad Haris Khan (None)
PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
Xiaofan Li (East China Normal University) · Zhizhong Zhang (East China Normal University) · Xin Tan (East China Normal University) · Yanyun Qu (Xiamen University) · Chengwei Chen (2nd Military Medical University) · Yuan Xie (East China Normal University) · Lizhuang Ma (Dept. of Computer Sci. & Eng., Shanghai Jiao Tong University)
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang (Tsinghua University, Tsinghua University) · Hui Chen (Tsinghua University, Tsinghua University) · Zijia Lin (Kuaishou Technology) · Jungong Han (Aberystwyth University) · Guiguang Ding (Tsinghua University)
Simple Semantic-Aided Few-Shot Learning
Hai Zhang (Sichuan University) · Junzhe Xu (None) · Shanlin Jiang (The University of Texas at Dallas) · Zhenan He (Sichuan University)
OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma (Peking University) · Shiliang Zhang (Peking University) · Longhui Wei (Huawei Cloud Technologies Ltd.) · Qi Tian (Huawei Technologies Ltd.)
An edit friendly ddpm noise space: inversion and manipulations
Inbar Huberman-Spiegelglas (Technion - Israel Institute of Technology, Technion - Israel Institute of Technology) · Vladimir Kulikov (Technion - Israel Institute of Technology, Technion - Israel Institute of Technology) · Tomer Michaeli (Technion)
AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift Factor
Sudong Cai (Kyoto University)
Improved Implicit Neural Representation with Fourier Reparameterized Training
Kexuan Shi (None) · Xingyu Zhou (University of Electronic Science and Technology of China) · Shuhang Gu (University of Electronic Science and Technology of China)
U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
You Wu (Institute of Computing Technology, CAS) · Kean Liu (University of the Chinese Academy of Sciences) · Xiaoyue Mi (None) · Fan Tang (Institute of Computing Technology, CAS) · Juan Cao (Institute of Computing Technology, Chinese Academy of Sciences) · Jintao Li (Institute of Computing Technology, Chinese Academy of Sciences)
DaReNeRF: Direction-aware Representation for Dynamic Scenes
Ange Lou (Vanderbilt University) · Benjamin Planche (United Imaging Intelligence) · Zhongpai Gao (United Imaging Intelligence) · Yamin Li (Vanderbilt University) · Tianyu Luan (State University of New York at Buffalo) · Hao Ding (Johns Hopkins University) · Terrence Chen (United Imaging Intelligence) · Jack Noble (Vanderbilt University) · Ziyan Wu (United Imaging Intelligence)
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
Jianjian Cao (Fudan University) · Peng Ye (Fudan University) · Shengze Li (Fudan University) · Chong Yu (Fudan University NVIDIA Corporation) · Yansong Tang (Tsinghua University) · Jiwen Lu (Tsinghua University) · Tao Chen (Fudan University)
COCONut: Modernizing COCO Segmentation
Xueqing Deng (ByteDance Research) · Qihang Yu (Johns Hopkins University) · Peng Wang (Bytedance US AILab) · Xiaohui Shen (ByteDance) · Liang-Chieh Chen (None)
Towards Automated Movie Trailer Generation
Dawit Argaw Argaw (None) · Mattia Soldan (None) · Alejandro Pardo (KAUST) · Chen Zhao (King Abdullah University of Science and Technology (KAUST)) · Fabian Caba Heilbron (Adobe Research) · Joon Chung (KAIST) · Bernard Ghanem (KAUST)
How to Configure Good In-Context Sequence for Visual Question Answering
Li Li (Southeast University) · Jiawei Peng (Southeast University) · huiyi chen (Southeast University - Monash University Joint Graduate School (Suzhou)) · Chongyang Gao (Northwestern University) · Xu Yang (Southeast University)
Capturing Closely Interacted Two-Person Motions with Reaction Priors
Qi Fang (NetEase) · Yinghui Fan () · Yanjun Li (None) · Junting Dong (None) · Dingwei Wu (NetEase, Inc.) · Weidong Zhang (Netease Games AI Lab) · Kang Chen ()
PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding
Xuesong Nie () · Haoyuan Jin (Zhejiang University) · Yunfeng Yan (Zhejiang University) · Xi Chen (the University of Hong Kong, University of Hong Kong) · Zhihang Zhu (Zhejiang University) · Donglian Qi (Zhejiang University)
Learning Object State Changes in Videos: An Open-World Perspective
Zihui Xue (None) · Kumar Ashutosh (UT Austin & FAIR, Meta) · Kristen Grauman (University of Texas at Austin)
Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images
JungEun Kim (Korea Advanced Institute of Science and Technology) · Hangyul Yoon (Korea Advanced Institute of Science and Technology (KAIST)) · Geondo Park (Korea Advanced Institute of Science and Technology) · Kyungsu Kim (Harvard Medical School and Massachusetts General Hospital) · Eunho Yang (Korea Advanced Institute of Science & Technology)
PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
Qi Zhao (None) · M. Salman Asif (University of California, Riverside) · Zhan Ma (Nanjing University)
G$^3$-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding
Yuan Wang (None) · Yali Li (Tsinghua University) · Shengjin Wang (Tsinghua University, Tsinghua University)
NightCC: Nighttime Color Constancy via Adaptive Channel Masking
Shuwei Li (national university of singaore, National University of Singapore) · Robby T. Tan (National University of Singapore)
DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning
Yuhang He (Xi'an Jiaotong University) · YingJie Chen (Xi'an Jiaotong University) · Yuhan Jin (Xi'an Jiaotong University) · Songlin Dong (Xi'an Jiaotong University) · Xing Wei (None) · Yihong Gong (Xi'an Jiaotong University)
Harnessing Large Language Models for Training-free Video Anomaly Detection
Luca Zanella (University of Trento) · Willi Menapace (University of Trento) · Massimiliano Mancini (University of Trento) · Yiming Wang (Fondazione Bruno Kessler) · Elisa Ricci (University of Trento)
ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing
Zhongze Wang (East China University of Science and Technology) · Haitao Zhao (East China University of Science and Technology) · Jingchao Peng (East China University of Science and Technology) · Lujian Yao (East China University of Science and Technology) · Kaijie Zhao (None)