Skip to yearly menu bar Skip to main content

CVPR 2024 Workshop Challenge Awardees

Challenge and Competition Winners


6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW)

Challenge period: 13 Jan – 19 March 2024
Workshop Date: 18 June 2024

The Competition is a continuation of the ABAW Competition held in CVPR 2023 & 2022, in ECCV  2022, in ICCV 2021, in IEEE FG in 2020 and in CVPR 2017. It includes the five below mentioned Challenges:

  1. Valence-Arousal Estimation Challenge to estimate the two continuous affect dimensions of valence and arousal in each frame of the utilized Challenge corpora.
  2. Expression Recognition Challenge to recognize between eight mutually exclusive classes in each frame of the utilized Challenge corpora
  3. Action Unit Detection Challenge to detect which of the 12 Action Units are activated in each frame of the utilized Challenge corpora
  4. Compound Expression Recognition Challenge to recognize between the 7 mutually exclusive classes in each frame of the utilized Challenge corpora
  5. Emotional Mimicry Intensity Estimation Challenge to predict the six following emotional dimensions: ”Admiration”, ”Amusement”, ”Determination”, ”Empathic Pain”, ”Excitement”, and ”Joy”


  • Wei Zhang, Feng Qiu, Chen Liu, Lincheng Li, Heming Du, Tiancheng Guo and Xin Yu, Netease Fuxi AI Lab

The 4th Workshop of Adversarial Machine Learning on Computer Vision: Robustness of Foundation Models

Challenge Period: 15 March - 20 May 2024

Workshop Date: 17 June 2024

As we observe the rapid evolution of large foundation models, their incorporation into the automotive landscape gains prominence, owing to their advanced capabilities. However, this progress is not without its challenges, and one such formidable obstacle is the vulnerability of these large foundation models to adversarial attacks. Adversaries could employ techniques such as adversarial textures and camouflage, along with other sophisticated strategies, to manipulate the perception, prediction, planning, and control mechanisms of the vehicle. These actions pose risks to the reliability and safety of autonomous driving systems.

To advance the research on building robust models in autonomous driving systems (ADS), we organize this challenging track for motivating novel adversarial attack algorithms against a spectrum of vision tasks under the realm of ADS, spanning image recognition, object detection, and semantic segmentation. Participants are encouraged to develop attack methods that generate subtle yet harmful images to expose vulnerability of large foundation models utilized in the ADS. (The challenge website)


Challenge: Black-box Adversarial Attacks on Vision Foundation Models

  • First place: Jiyuan Fu, Zhaoyu Chen, Kaixun Jiang, Haijing Guo, Shuyong Gao, Wenqiang Zhang; Fudan University, China
  • Second place: Nhat Chung, Sensen Gao, Ying Yang; Centre for Frontier AI Research (CFAR) and Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore
  • Third place: Hongda Chen, Xiyuan Wang, Xiucong Zhao, Jinxuan Zhao, Shuai Liu, Zhengyu Zhao, Chenhao Lin, Chao Shen; Xi’an Jiaotong University, China

Distinguished Paper Award

  • Siyuan Liang, Kuanrong Liu, Jiajun Gong, Jiawei Liang, Yuan Xuan, Ee-Chien Chang, Xiaochun Cao. Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning

Agriculture-Vision Challenges and Opportunities for Computer Vision in Agriculture

Challenge Period: 1 Feb – 3 June 2024

Workshop Date: 18 June 2024

The core of this challenge is to strategically use both the extended, unlabeled agriculture vision dataset and the labeled original agriculture vision dataset. The aim is to enhance model performance by effectively applying semi-supervised learning techniques. Participants are challenged to integrate the depth and variety of the unlabeled dataset with the structured framework of the labeled dataset to achieve superior results.


  • 1st place: Quan Huu Cap, Allis Inc, Japan
  • 2nd Place: Wang Liu, PhD candidate of Hunan University, China
  • 3rd place (tie): Key Laboratory of Intelligent Perception and Image Understanding, School of Artificial Intelligence, Xidian University, Xiaoqiang Lu, Licheng Jiao, Xu Liu, Lingling Li, Fang Liu, Wenping Ma, Shuyuan Yang.
  • 3rd place (tie): Team RSIDEA: Jingtao Li, Xiao Jiang, Chen Sun, Yang Pan, Hengwei Zhao, Xinyu Wang, Yanfei Zhong

SPARK Challenge @ AI4SPACE Workshop

Challenge Period: 15 January to 28 March 2024
Workshop Date: 17 June 2024

The 3rd edition of SPARK (SPAcecraft Recognition leveraging Knowledge of Space Environment) organized as part of the AI4Space workshop, in conjunction with the IEEE / CVF CVPR 2024, focuses on designing data-driven approaches for spacecraft component segmentation and trajectory estimation. SPARK utilizes data synthetically simulated with a game engine alongside data collected from the Zero-G lab at the University of Luxembourg. The SPARK challenge was organized by the Computer Vision, Imaging & Machine Intelligence Research Group (CVI²) at the Interdisciplinary Centre for Security, Reliability and Trust (SnT) of the University of Luxembourg (UL).

The challenge comprises two streams:

  • Stream 1: Spacecraft Semantic Segmentation: This stream aims to segment spacecraft components by classifying each pixel's value based on the component it belongs to. The training and validation datasets consist of RGB images with corresponding class labels and segmentation masks. An additional test set is provided for evaluation and ranking.
  • Stream 2: Spacecraft Trajectory Estimation: This stream challenges participants to leverage temporal data to estimate the 6DoF (six degrees of freedom) pose of the spacecraft. The training and validation datasets are composed of multiple sets of RGB images, each representing a rendezvous trajectory with associated position and orientation labels. Participants received a separate test set (without ground truth labels) for the evaluation process.


  • Spacecraft Semantic Segmentation 
    Team RUNZE; Rengang Li, Baoyu Fan, Runze Zhang, Xiaochuan Li, Lu Liu, Yanwei Wang, Zhenhua Guo, and Yaqian Zhao
  • Spacecraft Trajectory Estimation
    Team CSU_NUAA_PANG; Jianhong Zuo, Shengyang Zhang, Qianyu Zhang, Yutao Zhao, Baichuan Liu, and Aodi Wu

8th AI City Challenge

Challenge Period: 22 Jan – 25 March, 2024

Workshop Date: 17 June 2024

The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS). The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions.


  • Track 1 – Multi-Camera People Tracking

Winner: Shanghai Jiao Tong University + AI Lab, Lenovo Research + Monash University; Zhenyu Xie, Zelin Ni, Wenjie Yang, Yuang Zhang, Yihang Chen, Yang Zhang, Xiao Ma

Runner-Up:  Yachiyo Engineering Co., Ltd. + Chubu University; Ryuto Yoshida, Junichi Okubo, Junichiro Fujii, Masazumi Amakata, Takayoshi Yamashita

  • Track 2 – Traffic Safety Description and Analysis

Winner:  Alibaba Group; Zhizhao Duan, Hao Cheng, Duo Xu, Xi Wu, Xiangxie Zhang, Xi Ye, Zhen Xie

Runner-Up: Ho Chi Minh City University of Technology, VNU-HCM, Vietnam + University of Science, VNU-HCM, Vietnam + University of Information Technology, VNU-HCM, Vietnam + FPT Telecom, Vietnam + AI Lab- AI VIETNAM + Vietnam National University Ho Chi Minh City; Khai Trinh Xuan, Khoi Nguyen Nguyen, Bach Hoang Ngo, Vu Dinh Xuan, Minh-Hung An, Quang-Vinh Dinh

  • Track 3 – Naturalistic Driving Action Recognition

Winner:  China Telecom Artificial Intelligence Technology (Beijing) Co., Ltd.; Tiantian Zhang, Qingtian Wang, Xiaodong Dong, Wenqing Yu, Hao Sun, Xuyang Zhou, Aigong Zhen, Shun Cui, Dong Wu, Zhongjiang He

Runner-Up: Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea; Huy-Hung Nguyen, Chi Dai Tran, Long Hoang Pham, Duong Nguyen-Ngoc Tran, Tai Huu-Phuong Tran, Duong Khac Vu, Quoc Pham-Nam Ho, Ngoc Doan-Minh Huynh, Hyung-Min Jeon, Hyung-Joon Jeon, Jae Wook Jeon

  • Track 4 – Road Object Detection in Fish-Eye Cameras

Winner: VNPTAI, VNPTGroup, Hanoi, Vietnam + Faculty of Computer Science, Phenikaa University, Hanoi, Vietnam + University of Transport and Communications, Hanoi, Vietnam; Viet Hung Duong, Duc Quyen Nguyen, Thien Van Luong, Huan Vu, Tien Cuong Nguyen

Runner-Up: Nota Inc., Republic of Korea; Wooksu Shin, Donghyuk Choi, Hancheol Park, Jeongho Kim

Honorable Mention: Department of Electrical and Computer Engineering Sungkyunkwan University; Long Hoang Pham, Quoc Pham-Nam Ho, Duong Nguyen-Ngoc Tran, Tai Huu-Phuong Tran, Huy-Hung Nguyen, Duong Khac Vu, Chi Dai Tran, Ngoc Doan-Minh Huynh, Hyung-Min Jeon, Hyung-Joon Jeon, Jae Wook Jeon

  • Track 5 – Detecting Violation of Helmet Rule for Motorcyclists

Winner: University of Information Technology, VNU-HCM, Vietnam + Vietnam National University, Ho Chi Minh City, Vietnam; Hao Vo, Sieu Tran, Duc Minh Nguyen, Thua Nguyen, Tien Do, Duy-Dinh Le, Thanh Duc Ngo

Runner-Up: China Mobile Shanghai ICT Co.,Ltd; Yunliang Chen, Wei Zhou, Zicen Zhou, Bing Ma, Chen Wang, Yingda Shang, An Guo, Tianshu Chu

AI for Content Creation

Workshop Date: 17 June 2024

The AI for Content Creation (AI4CC) workshop at CVPR brings together researchers in computer vision, machine learning, and AI. Content creation is required for simulation and training data generation, media like photography and videography, virtual reality and gaming, art and design, and documents and advertising (to name just a few application domains). Recent progress in machine learning, deep learning, and AI techniques has allowed us to turn hours of manual, painstaking content creation work into minutes or seconds of automated or interactive work. For instance, generative adversarial networks (GANs) can produce photorealistic images of 2D and 3D items such as humans, landscapes, interior scenes, virtual environments, or even industrial designs. Neural networks can super-resolve and super-slomo videos, interpolate between photos with intermediate novel views and even extrapolate, and transfer styles to convincingly render and reinterpret content. In addition to creating awe-inspiring artistic images, these offer unique opportunities for generating additional and more diverse training data. Learned priors can also be combined with explicit appearance and geometric constraints, perceptual understanding, or even functional and semantic constraints of objects.


AIS: Vision, Graphics and AI for Streaming

Challenge Period: 5 Feb - 29 March, 2024

Workshop Date: 17 June 2024

Welcome to the 1st Workshop on AI for Streaming at CVPR! This workshop focuses on unifying new streaming technologies, computer graphics, and computer vision, from the modern deep learning point of view. Streaming is a huge industry where hundreds of millions of users demand everyday high-quality content on different platforms. Computer vision and deep learning have emerged as revolutionary forces for rendering content, image and video compression, enhancement, and quality assessment. From neural codecs for efficient compression to deep learning-based video enhancement and quality assessment, these advanced techniques are setting new standards for streaming quality and efficiency. Moreover, novel neural representations also pose new challenges and opportunities in rendering streamable content, and allowing to redefine computer graphics pipelines and visual content. Check out our website for more information:

This Workshop consisted of 5 different challenges:


  • See this document for certificates outlining each challenge’s winner.

Workshop on Autonomous Driving (WAD)

Waymo Open Dataset Challenges

Challenge Date: 22 May 2024

Workshop Date: 17 June 2024

The 5th Waymo Open Dataset Challenge edition covered a range of fundamental research topics in the autonomous driving domain and included:

  • Motion Prediction: Given the past 1 second agent history on a corresponding map and the associated lidar and camera data for this time interval, predict the positions of up to 8 agents for 8 seconds into the future. Use of lidar and camera data are optional.
  • Sim Agents: Given the agent tracks for the past 1 second on a corresponding map, and optionally the associated lidar for this time interval, simulate 32 realistic joint futures for all the agents in the scene. It’s the second year we are running the Sim Agents Challenge, but participants can now leverage our new Waymax simulator that we made available to the research community a few months ago.
  • 3D Semantic Segmentation: Given one or more lidar range images and the associated camera images, produce a semantic class label for each lidar point.
  • Occupancy and Flow Prediction: Predict the bird’s-eye view (BEV) roadway occupancy and motion flow of all observed and occluded vehicles given observed agent tracks for the last second.


  • Challenge 1: Motion Prediction Challenge: Chen Shi, Shaoshuai Shi, Li Jiang, Zikang Zhou, Jianping Wang, Yung-Hui Li, Yu-Kai Huang, Jiawei Sun, Jiahui Li, Tingchen Liu, Chengran Yuan, Shuo Sun, Yuhang Han, Keng Peng Tee, Anthony Wong, Marcelo H. Ang Jr.
  • Challenge 2: Sim Agents Challenge: Zikang Zhou, Haibo Hu, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, Chun Jason Xue, Zhiyu Huang, Zixu Zhang, Jaime Fernández Fisac, Chen Lv, Zhejun Zhang, Christos Sakaridis, Luc Van Gool
  • Challenge 3: 3D Semantic Segmentation Challenge: Xiaoyang Wu, Xiang Xu, Lingdong Kong, Liang Pan, Ziwei Liu, Tong He, Wanli Ouyang, Hengshuang Zhao, Qing Wu, Osama Amjad, Ammad Nadeem
  • Challenge 4: Occupancy Flow Challenge: Haochen Liu, Zhiyu Huang, Wenhui Huang, Haohan Yang, Xiaoyu Mo, Hongyang Gao, Chen Lv, Gaeun Kim, Daeil Han, YeongJun Koh, Hanul Kim, Zhan Chen, Chen Tang, Lu Xiong

AVA: Accessibility, Vision, and Autonomy Meet Challenge

Workshop Date: 18 June 2024

Computer vision and machine learning benchmarks for understanding humans generally lack in diversity, rarely incorporating individuals with mobility aids. To address this gap, our challenge leverages a large synthetic benchmark for accessibility-oriented computer vision tasks, particularly in the context of embodied navigation and autonomous systems. The benchmark is quite challenging, spanning different camera perspectives, ambient settings (weathers, towns), dense social navigation settings, and diverse pedestrian mobility aids. There are two main tasks: instance segmentation and keypoint detection. The evaluation follows COCO AP evaluation but with introduced fine-grained categories, e.g., classes such as `cane’ and `wheelchair’. Such classes confound state-of-the-art vision models, for instance, pedestrians with a cane or in wheelchairs tend to result in degraded performances. The challenge was hosted on Eval.AI (both for segmentation and keypoints). 


Challenge 1: Instance Segmentation Challenge

  • First place: Xiangheng Shan, Huayu Zhang, Jialong Zuo, Nong Sang and Changxin Gao; School of Artificial Intelligence and Automation, Huazhong University of Science and Technology.
  • Second place: Xiaoqiang Lu, Licheng Jiao, Xu Liu, Lingling Li, Fang Liu, Wenping Ma, Shuyuan Yang; Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University.
  • Third place: Qin Ma, Jinming Chai, and Zhongjian Huang; Intelligent Perception and Image Understanding Lab, Xidian University.

Challenge 2: Pose Estimation Challenge

  • First place: Jiajun Fu, Shaojie Zhang, Jianqin Yin; Beijing University of Posts and Telecommunications
  • Second place: Chuchu Xie; Tsinghua University
  • Third place:  Seungjun Lee; SKTelecom

The 7th Workshop and Challenge Bridging the Gap between Computational Photography and Visual Recognition (UG2+)

Workshop date: 17 June 2024

With the development and successes of computer vision algorithms, application to real-world environments become increasingly feasible for these methodologies. However, real-world applications may involve degradations which are not often included in standard training and testing datasets: poor illumination, adverse weather conditions, aberrated optics, or complicated sensor noise. In addition to this, non-standard imaging solutions such as those for flexible wearables or augmented reality headsets may require unconventional processing algorithms which complicates their path to real-world application. What is the gap in performance for the current state of the art when placed into harsh, real-world environments? The central theme of the workshop in this proposal is to invite researchers to investigate this question and push the state-of-the-art forward for real-world application of recognition tasks in challenging environments. Continuing the history of success at CVPR 2018–2023, we provide its 7th version for CVPR 2024. It will inherit the successful benchmark dataset, platform and other evaluation tools used in previous UG2+ workshops, as well as broadening its scope with new tracks and applications within the context of real-world application of computer vision algorithms.


Ug2 Challenge winners:

Challenge 1: Atmospheric turbulence mitigation winners

  • Subtrack 1: Run Sun, Shengqi, Yuxing Duan, Xiaoyi Zhang, Shuning Cao, Hanyu Zhou, Yi Chang, Luxin Yan
  • Subtrack 2: Run Sun, Shengqi, Yuxing Duan, Xiaoyi Zhang, Shuning Cao, Hanyu Zhou, Yi Chang, Luxin Yan

Challenge 2: All weather semantic segmentation

  • Nan Zhang, Xidan Zhang, Jianing Wei, Fangjun Wang, Zhiming Tan

Challenge 3: UAV tracking and pose estimation

  • Tianchen Deng, Yi Zhou, Wenhua Wu, Mingrui Li, Jingwei Huang, Shuhong Liu, Yanzeng Song, Hao Zuo, Yanbo Wang, Hesheng Wang and Weidong Chen

5th Chalearn Face Anti-spoofing Workshop and Challenge

Challenge Period: 1 February– 14 April 2024
Workshop Date: 17 June 2024

In recent years the security of face recognition systems has been increasingly threatened. Face Anti-spoofing (FAS) is essential to secure face recognition systems primarily from various attacks. In order to attract researchers and push forward the state of the art in Face Presentation Attack Detection (PAD), we organized four editions of Face Anti-spoofing Workshop and Competition at CVPR 2019, CVPR 2020, ICCV 2021, and CVPR 2023, which together have attracted more than 1200 teams from academia and industry, and greatly promoted the algorithms to overcome many challenging problems. In addition to physical presentation attacks (PAs), such as printing, replay, and 3D mask attacks, digital face forgery attacks (FAs) are still a threat that seriously endangers the security of face recognition systems. FAs aim to attack faces using digital editing at the pixel level, such as identity transformation, facial expression transformation, attribute editing, and facial synthesis. At present, detection algorithms for these two types of attacks, ``Face Anti-spoofing (FAS)" and ``Deep Fake/Forgery Detection (DeepFake)", are still being studied as independent computer vision tasks, and cannot achieve the functionality of a unified detection model to respond to both types of attacks simultaneously. To give continuity to our efforts in these relevant problems, we are proposing the 5th Face Anti-Spoofing Workshop@CVPR 2024. We analyze different types of attack clues as the main reason for the incompatibility between these two detection. The spoofing clues based on physical presentation attacks are usually caused by color distortion, screen moire patterns, and production traces. In contrast, the forgery clues based on digital editing attacks are usually changes in pixel values. The fifth competition aims to encourage the exploration of common characteristics in these two types of attack clues and promote the research of unified detection algorithms. Fully considering the above difficulties and challenges, we collect a Unified physical-digital Attack dataset, namely UniAttackData, for this fifth edition for algorithm design and competition promotion, including 1,800 participations with 2 and 12 physical and digital attacks, respectively, with a total of 28,706 videos. 


Track1: Unified Physical-Digital Face Attack Detection

  • First Place:Xianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Xi Li, Yao Wang, Binjie Mao, Pengfei Yan (Meituan)
  • Second Place: Minzhe Huang, Changwei Nie, Weihong Zhong, Zehua Lan, Ruyi Cai, Xuanwu Yun ( Akuvox)
  • Third Place: Jiaruo Yu, Dagong Lu, Xingyue Shi, Chenfan Qu, Fengjun Guo (INTSIG Information Co. Ltd)

Track 2: Snapshot Spectral Imaging Face Anti-spoofing

  • First Place: Zhaofan Zou, Hui Li, Yaowen Xu, Zhixiang He (China Telecom Artificial Intelligence Technology Co. Ltd)
  • First Place: Chuanbiao Song, Jun Lan, Yan Hong, Brian Zhao (Ant Group)
  • Third Place: Minzhe Huang, Zehua Lan, Ruyi Cai, Xuanwu Yun (Akuvox)

4th Workshop on Computer Vision in the Built Environment
Workshop date: 18 June 2024

The 4th International Scan-to-BIM competition targets the development of computer vision methods that automatically generate the semantic as-is status of buildings given their 3D point clouds. Specifically, the challenge includes two tracks: (i) 2D Floor Plan Reconstruction: given a 3D point cloud as input, this track aims at automatically reconstructing the 2D vectorized floorplan of the captured area; and (ii) 3D Building Model Reconstruction; given a 3D point cloud as input, this track aims at reconstructing the 3D parametric semantic model of the captured area. The evaluation is based on geometric, semantic, and topological metrics.


  • Challenge #1: 2D Floor Plan Reconstruction: Longyong Wu, Ziqi Li, Meng Sun, Fan Xue
  • Challenge #2: 3D Building Model Reconstruction:
    1. Siyuan Meng, Sou-Han Chen, Jiajia Wang, Fan Xue
    2. Sam De Geyter, Maarten Bassier, Heinder De Winter, Maarten Vergauwen
    3. Fabian Kaufmann, Mahdi Chamseddine, Jason Rambach

5th Workshop on Continual Learning in Computer Vision
Challenge Period: 20 February - 18 May 2024
Workshop Date: 18 June 2024

This challenge aims to explore techniques that combine two fundamental aspects of data efficiency: continual learning and unlabelled data usage. In real-world machine learning applications, it is fair to assume that data is not cheap to acquire, store and label. Therefore, it is crucial to develop strategies that are flexible enough to learn from streams of experiences, without forgetting what has been learned previously. Additionally, contextual unlabelled data can often be provided for a fraction of the cost of labeled data and could be exploited to integrate additional information into the model.


  • First Place: Team NJUST-KMG; Sishun Pan, Tingmin Li, Yang Yang
  • Second Place: Team SNUMPR; Taeheon Kim, San Kim, Dongjae Jeon, Minhyuk Seo, Wonje Jeong, Jonghyun Choi
  • Third Place: Team CtyunAI; Chengkun Ling, Weiwei Zhou
  • Fourth Place: Team PM-EK; Panagiota Moraiti, Efstathios Karypidis

Data Curation and Augmentation in Medical Imaging (DCAMI) Workshop
Workshop Date: 17 June 2024

Award Period: 31 January to 22 March 2024

Data-driven computer vision and AI solutions for medical imaging represent a great potential to make a real-life impact by improving patient care. However, safety requirements associated with healthcare pose major challenges for this research field, especially regarding data curation. Collection and annotation of medical data is often resource-intensive due to the need for medical expertise. At the same time, data quality is of the highest importance to ensure safe and fair usage in clinical settings. As a result, efficient data curation and validation, learning from small data as well as data synthesis are important areas of research.

In addressing these demands, data engineering emerges as a crucial driver in advancing medical imaging research into deployment. The paper awards aim to promote works on topics related to medical imaging and data engineering, and push forward the frontier of data curation and augmentation for medical applications to tackle the challenges of limited or imperfect data in the real-world.


  • Best Paper Award: Abril Corona-Figueroa, Hubert P. H. Shum, Chris G. Willcocks
    "Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling"
  • Best Paper Runner-Up Award: Soham Gadgil, Alex DeGrave, Roxana Daneshjou, Su-In Lee
    "Discovering mechanisms underlying AI prediction of protected attributes via data auditing"
  • Bench-to-Bedside Award: Hallee E. Wong, Marianne Rakic, John Guttag, Adrian V Dalca
    "ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image"
  • Best Poster Award Winner: Adway Kanhere

Domain Adaptation, Explainability and Fairness in AI for Medical Image Analysis (DEF-AI-MIA) Workshop
Challenge Period: 8 January - 19 March 2024
Workshop Date: 17 June 2024

The DEF-AI-MIA COV19D Competition has been the 4th in the series of COV19D Competitions following the first 3 Competitions held in the framework of  ICCV 2021, ECCV 2022 and ICASSP 2023 Conferences respectively. It included two Challenges: i) Covid-19 Detection Challenge and ii) Covid-19 Domain Adaptation Challenge. Both Challenges were  based on the COV19-CT-DB database, including  chest CT scan series. The 1st  Challenge aimed to build effective models for fast and reliable Covid-19 diagnosis based on 3-D chest CT scans, whilst the 2nd Challenge aimed to use Deep Learning Domain Adaptation methodologies to diagnose Covid-19 across annotated datasets from medical sources and non-annotated datasets from other medical sources.


  • COVID19 Detection Challenge
    Team MDAP;  Robert Turnbull and Simon Mutch
  • COVID19 Domain Adaptation Challenge
    Team FDVTS;  Runtian Yuan, Qingqiu Li, Jilan Xu, Rui Feng, Yuejie Zhang, Junlin Hou, Hao Chen

The Fifth Annual Embodied Artificial Intelligence Workshop

Challenge Period: March-June 2024

Workshop Date: 18 June 2024

Minds live in bodies, and bodies move through a changing world. The goal of embodied artificial intelligence is to create agents, such as robots, which learn to creatively solve challenging tasks requiring interaction with the environment. Fantastic advances in deep learning and the increasing availability of large datasets like ImageNet have enabled superhuman performance on a variety of AI tasks previously thought intractable. These advances have supercharged embodied AI, enabling a growing collection of researchers to make rapid progress towards intelligent agents which can:

  • See: perceive their environment through vision or other senses.
  • Talk: hold a natural language dialog grounded in their environment.
  • Listen: understand and react to audio input anywhere in a scene.
  • Act: navigate and interact with their environment to accomplish goals.
  • Reason: consider and plan for the long-term consequences of their actions.

The goal of the Embodied AI Workshop is to bring together researchers from computer vision, language, graphics, and robotics to share and discuss the latest advances in embodied intelligent agents. The 2024 workshop hosted six exciting challenges covering a wide range of topics such as mobile manipulation, visual/tactile skill learning, and visual, language and social navigation. Our winners include:

  • The ARNOLD Challenge: Language-Grounded Manipulation
    • First Place - Robot AI: Yinchao Ma, Liangsheng Liu, Runhui Bao, Wenfei Yang, Shifeng Zhang, Xu Zhou, Tianzhu Zhang (University of Science and Technology of China, Deep Space Exploration Laboratory, Sangfor Research Institute)
    • Second Place - Fun Guy: Wang Sen, Li Jiayi, Niu Ye, Li Meixuan, Wang Le, Zhou Sanping (Xi’an Jiaotong University)
    • Third Place - SSU Reality Lab: Sangmin Lee, Sungyong Park, and Heewon Kim (Soongsil University)
  • The HAZARD Challenge: Multi-Object Rescue
    • Winners forthcoming
  • The HomeRobot Open Vocabulary Mobile Manipulation Challenge
    • First Place - UniTeam: Andrew Melnik, Michael Büttner, Lyon Brown, Gaurav Kumar Yadav, Arjun PS, Gora Chand Nandi, Jonathan Francis (Bielefeld University (Germany), IIIT-A (India), CMU (USA))
    • Second Place - Rulai: Yansen Han, Qiting Ye, Yaping Zhao, Yang Luo, Jinxin Zhu, Xuan Gu, Bingyi Lu, Qinyuan Liu, Chenxiao Dou*, Yansong Chua (China Nanhu Academy of Electronics and Information Technology)
    • Third Place - KuzHum: Volodymyr Kuzma, Vladyslav Humennyy, Ruslan Partsey (Ukrainian Catholic University)
  • The ManiSkill-ViTac Challenge 2024: Vision-based-Tactile Manipulation Skill Learning
    • First Place - Illusion: Yunxiang Cai, Xuetao Li, Shengheng Ma, Haiqiang Zheng, Fang Gao (Guangxi University)
    • Second Place - Luban (Tie): Xiaohang Shi, Haoran Zheng, Piaopiao Jin, Leyuan Yan, Ange Bao, Pei Zhao (Zhejiang University)
    • Second Place - TouchSight Innovators (Tie): Zhiyuan Wu, Xuyang Zhang, Jiaqi Jiang, Daniel Gomes, Shan Luo (Tongji University, King’s College London)
  • MultiON: Multi-Object Navigation
    • First Place - IntelliGO: Francesco Taioli, Marco Cristani, Alberto Castellini, Alessandro Farinelli, Yiming Wang (University of Verona, FBK, Polytechnic of Turin)
  • PRS Challenge: Human Society Integration Exploration
    • First Place - PDA: Xingrui Wang, Feng Wang (Johns Hopkins University, University of Southern California)

For more information on these challenges and their winners, check out their websites above, or check out the Embodied AI Workshop website for present and past challenges.

First Joint Egocentric Vision (EgoVis) Workshop

Workshop date: 17 June 2024

The first edition of the EgoVis workshop has seen the participation of researchers from diverse backgrounds, geographical regions and institutions, both academic and industrial. Aiming to be the focal point for the egocentric computer vision community to meet and discuss progress in this fast growing research area, the workshop addresses egocentric vision in a comprehensive manner including keynote talks, research presentations, an awards ceremony, as well as, key research challenges in video understanding across several different benchmark datasets covering, multi-modal data, interaction learning, self-supervised learning, AR/VR with applications to cognitive science and robotics. This year, the workshop offered 31 different challenges across six egocentric benchmark datasets, namely, HoloAssist, Aria Digital Twin, Aria Synthetic Environments, Ego4D, Ego-Exo4D, and EPIC-KITCHENS.  Winning teams shared their insights either through invited talks or poster presentations.


  • See this URL for the list of awardees.

Foundation Models for Autonomous Systems
Challenge Period: 1 March - 1 June 2024
Workshop Date: 17 June 2024

The field of autonomy is rapidly evolving, and recent advancements from the machine learning community, such as large language models (LLM) and world models, bring great potential. We believe the future lies in explainable, end-to-end models that understand the world and generalize to unvisited environments. In light of this, we proposed seven new challenges that push the boundary of existing perception, prediction, and planning pipelines to include: end-to-end driving at scale; predictive world model; occupancy and flow; multi-view 3D visual grounding; driving with langage; mapless driving; and autonomous driving.


  • End-To-End Driving at Scale
    Innovation Award & Outstanding Champion: Team NVIDIA; Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Jose M. Alvarez
    Honorable Runner-Up: ZERON;  Zilong Guo, Yi Luo, Long Sha, Dongxu Wang, Panqu Wang, Chenyang Xu, Yi Yang
  • Predictive World Model
    Innovation Award & Honorable Runner-Up: Huawei-Noah & CUHK-SZ; Haiming Zhang, Xu Yan, Ying Xue , Zixuan Guo , Shuguang Cui , Zhen Li , Bingbing Liu
    Outstanding Champion: USTC_IAT_United; Jun Yu , Shuoping Yang , Yinan Shi , Zhihao Yan , Zhao Yang , Ruiyu Liu, Fengzhao Sun , Yunxiang Zhang
  • Occupancy and Flow
    Outstanding Champion: IEIT-AD; Yun Zhao, Peiru Zheng, Zhan Gong
    Honorable Runner-Up: Harbour-Chips; Zhimin Liao, Ping Wei
  • Multi-View 3D Visual Grounding
    Innovation Award & Outstanding Champion: THU-LenovoAI; Henry Zheng,  Hao Shi, Yong Xien Chng, Rui Huang , Zanlin Ni , Tianyi Tan , Qihang Peng , Yepeng Weng , Zhongchao Shi , Gao Huang
    Honorable Runner-Up: Mi-Robot; Cai Liang, Bo Li, Zhengming Zhou, Longlong Wang, Pengfei He, Liang Hu, Haoxing Wang
  • Driving with Language
    Innovation Award: ADLM; Yang Dong , Hansheng Liang , Mingliang Zhai , Cheng Li , Meng Xia , Xinglin Liu , Mengjingcheng Mo , Jiaxu Leng , Ji Tao , Xinbo Gao
    Outstanding Champion: NJU-ImagineLab; Jiajhan Li, Tong Lu
    Outstanding Champion: CPS; Jinghan Peng, Jingwen Wang, Xing Yu, Dehui Du                        
  • Mapless Driving
    Innovation Award & Outstanding Champion: LGmap; Kuang Wu, Sulei Nian, Can Shen,  Chuan Yang, Zhanbin Li
  • CARLA Autonomous Driving Challenge
    Innovation Award & Outstanding Champion: LLM4AD; Katrin Renz, Long Chen, Ana-Maria Marcu, Jan Hünermann, Benoit Hanotte, Alice Karnsund, Jamie Shotton, Elahe Arani, Oleg Sinavski
    Honorable Runner-Up: Tuebingen_AI; Julian Zimmerlin, Jens Beißwenger, Bernhard Jaeger, Andreas Geiger, Kashyap Chitta

Foundation Models for Medical Vision

Workshop Date: 17 June 2024

The Segment Anything In Medical Images On Laptop Challenge seeks universal promptable medical image segmentation foundation models that are deployable on laptops or other edge devices without reliance on GPUs.


  • 1st place: Bao-Hiep Le, Dang-Khoa Nguyen-Vu, Trong-Hieu Nguyen-Mau, Hai-Dang Nguyen, Minh-Triet Tran (University of Science; Vietnam National University; John von Neumann Institute, Ho Chi Minh City, Vietnam) MedficientSAM: A Robust Medical Segmentation Model with Optimized Inference Pipeline for Limited Clinical Settings
  • 2nd place: Alexander Pfefferle, Lennart Purucker, Frank Hutter (University of Freiburg, Freiburg; ELLIS Institute Tübingen, Tübingen, Germany), DAFT: Data-Aware Fine-Tuning of Foundation Models for Efficient and Effective Medical Image Segmentation 
  • 3rd place: Muxin Wei, Shuqing Chen, Silin Wu, and Dabin Xu (Harbin Institute of Technology, Harbin, China) Rep-MedSAM: Towards Real-time and Universal Medical Image Segmentation


Human Motion Generation (HuMoGen)
Workshop Date: 18 June 2024

Motion is one of the fundamental attributes of human (and animal) life and underlies our actions, gestures as well as our behavior. The capture and synthesis of human motion have been among the core areas of interest for the CVPR community and facilitate a variety of applications such as avatar creation, 3D character animations, AR/VR, crowd simulation, sports analytics and many more.

The prime goal of the workshop is to bring the human motion synthesis community together and foster discussions about the existing challenges and future direction. To enable this, we feature invited talks presented by a diverse group of leading experts spanning a variety of sub-domains. With this workshop, we hope to encourage cross-pollination of ideas coming from different vantage points as well as discuss the gap between the academic and the industrial perspectives of the topic.


  • Best paper: Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis, by Shivam Mehta, Anna Deichler, Jim O'Regan, Birger Moëll, Jonas Beskow, Gustav Eje Henter, Simon Alexanderson

Image Matching Challenge 2024 (Kaggle)

Challenge Period: 25 March 25 – 3 June 2024

Workshop Date:

The goal of this workshop was to construct precise 3D maps using sets of images in diverse scenarios and environments, from world heritage sites to night-time images or transparent objects.


  • Igor Lashkov, Jaafar Mahmoud, Ammar Ali, Yuki Kashiwaba, Vladislav Ostankovich for applying state of the art ML models for 3D reconstruction and mindfully replacing them with manually-engineered solutions when they fail.

9th New Trends in Image Restoration and Enhancement (NTIRE) Workshop and Challenges

Challenge Period: 30 Jan - 22 March, 2024

Workshop Date: 17 June 2024

Image restoration, enhancement and manipulation are key computer vision tasks, aiming at the restoration of degraded image content, the filling in of missing information, or the needed transformation and/or manipulation to achieve a desired target (with respect to perceptual quality, contents, or performance of apps working on such images). There is an ever-growing range of applications in fields such as surveillance, the automotive industry, electronics, remote sensing, or medical image analysis, etc. The emergence and ubiquitous use of mobile and wearable devices offer another fertile ground for additional applications and faster methods. This workshop provides an overview of the new trends and advances in those areas. Moreover, it will offer an opportunity for academic and industrial attendees to interact and explore collaborations.

This Workshop had 17 different associated challenges:

·  Dense and Non-Homogeneous Dehazing

·  Night Photography Rendering

·  Blind Enhancement of Compressed Image 

·  Shadow Removal - Track 1 Fidelity
·  Shadow Removal - Track 2 Perceptual

·  Efficient Super Resolution

·  Image Super Resolution (x4)

·  Light Field Image Super-Resolution - Track 1 Fidelity

·  Light Field Image Super-Resolution - Track 2 Efficiency

·  Stereo Image Super-Resolution - Track 1 Bicubic

·  Stereo Image Super-Resolution - Track 2 Realistic

·  HR Depth from Images of Specular and Transparent Surfaces - Track 1 Stereo 

·  HR Depth from Images of Specular and Transparent Surfaces - Track 2 Mono
·  Bracketing Image Restoration and Enhancement - Track 1 

·  Bracketing Image Restoration and Enhancement - Track 2 

·  Portrait Quality Assessment

·  Quality Assessment for AI-Generated Content - Track 1 Image

·  Quality Assessment for AI-Generated Content - Track 2 Video

·  Restore Any Image Model (RAIM) in the Wild

·  RAW Image Super-Resolution

·  Short-form UGC Video Quality Assessment

·  Low Light Enhancement

·  RAW Burst Alignment and ISP Challenge


  • See this document for certificates outlining each challenge’s winner.

MetaFood Workshop

Challenge Period: 25 March – 4 June 2024

Workshop Date: 17 June 2024

3D Food Reconstruction is an innovative venture into the intersection of computer vision and culinary arts, with the goal of reconstructing three-dimensional models of food items from two-dimensional images. This challenge is designed to push the boundaries of 3D reconstruction technology by applying it to a dynamic and uncontrolled setting, such as capturing food images during eating occasions. By reconstructing 3D models with the correct size, this technology can play a vital role in tracking nutritional intake and helping individuals maintain a healthy diet. This endeavor not only aims to enhance sharing food experiences in three dimensions but also has significant potential applications in the fields of nutrition and health monitoring. 


  • First place: Ahmad AlMughrabi, Umair Haroon, Ricardo Jorge Rodrigues Sepúlveda Marques, Petia Radeva
  • Second place: Jiadong Tang, Dianyi Yang, Yu Gao, Zhaoxiang Liang
  • Best Mesh Reconstruction: Yawei Jueluo, Chengyu Shi, Pengyu Wang

Mobile Intelligent Photography and Imaging
Challenge Period: 10 Jan - 1 June 2024
Workshop Date: 18 June 2024

Developing and integrating advanced image sensors with novel algorithms in camera systems is prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Together with the workshop, we organize a few exciting challenges and invite renowned researchers from both industry and academia to share their insights and recent work. Our challenge includes three tracks: Few-shot Raw Image Denoising, Demosaic for HybridEVS Camera, and Nighttime Flare Removal.


  • Nighttime Flare Removal
    First Place: Team: MiAlgo_AI; Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei from Xiaomi Inc., China.

Second Place: Team: BigGuy; Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang from Tianjin University, Shenzhen MicroBT Electronics Technology Co. Ltd, China, Shanghai Artificial Intelligence Laboratory, China.

Third Place: Team: SFNet-FR; Florin-Alexandru Vasluianu, Zongwei Wu, George Ciubotariu, Radu Timofte from Computer Vision Lab, CAIDAS & IFI, University of Wurzburg, Germany

  • Few-shot Raw Image Denoising
    First Place: Team: MiVideoNR; Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun from Video Algorithm Group, Camera Department, Xiaomi Inc., China

Second Place: Team: Samsung MX(Mobile eXperience) Business & Samsung Research China - Beijing (SRC-B); Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan, Zikun Liu, Wenhan Luo from Samsung Research China - Beijing (SRC-B), Department of Camera Innovation Group, Samsung Electronics, Sun Yat-sen University

Third Place: Team: AIIA; Mingde Qiao, Junjun Jiang, Kui Jiang, Yao Xiao, Chuyang Sun, Jinhui Hu and Weijian Ruan from Harbin Institute of Technology, Smart City Research Institute of China, Electronics Technology Group Corporation

  • Demosaic for HybridEVS Camera
    First Place: Team: USTC604; Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha from University of Science and Technology of China

Second Place: Team: lolers; Jun Cao, Cheng Li, Shu Chen, Liang Ma from Xiaomi Inc., China Third Place: Team: Lumos Demosaicker; Shiyang Zhou, Haijin Zeng, Kai Feng, Yongyong Chen, Jingyong Su from Harbin Institute of Technology (Shenzhen), IMEC-UGent, Northwestern Polytechnical University

Multimodal Algorithmic Reasoning Workshop and SMART-101 Challenge

Challenge period: 28 March – 8 June 2024

Workshop Date: 17 June 2024

A focus of the Multimodal Algorithmic Reasoning (MAR) workshop is to nudge the vision community to make progress on building neural networks that have human-like intelligence abilities for abstraction, inference, and generalization. To this end, we conducted the SMART-101 challenge as part of the workshop. This challenge is based on the Simple Multimodal Algorithmic Reasoning Task (SMART) and the SMART-101 dataset consisting of vision-and-language algorithmic reasoning puzzles, which are designed specifically for children in the 6–8 age group. The puzzles demonstrate the need for algorithmic reasoning on abstract visual puzzles, which we believe would be a useful test bed for evaluating the core knowledge competencies of multimodal large language models – a topic of great interest and excitement in computer vision currently. The solution to each puzzle in the challenge needs a mix of various basic mathematical, compositional, and algorithmic reasoning skills, including the knowledge to use basic arithmetic, algebra, spatial reasoning, logical reasoning, path tracing, and pattern matching. The challenge was based on a private test set and the evaluation used the Eval.AI platform.


     ● Winner: Jinwoo Ahn, Junhyeok Park, Min-Jun Kim, Kang-Hyeon Kim, So-Yeong Sohn,Yun-Ji Lee, Du-Seong Chang, Yu-Jung Heo, and Eun-Sol Kim, Hanyang University, South Korea

     ● Runner-up: Zijian Zhang and Wei Liu, Harbin Institute of Technology, China

OpenSun3D 2nd Workshop on Open-Vocabulary 3D Scene Understanding

Challenge Period: 17 April - 14 June 2024

Workshop date: 18 June 2024

The ability to perceive, understand and interact with arbitrary 3D environments is a long-standing research goal with applications in AR/VR, robotics, health and industry. Many 3D scene understanding methods are largely limited to recognizing a closed-set of pre-defined object classes. In the first track of our workshop challenge, we focus on open-vocabulary 3D object instance search. Given a 3D scene and an open-vocabulary, text-based query, the goal is to localize and densely segment all object instances that fit best with the specified query. If there are multiple objects that fit the given prompt, each of these objects should be segmented, and labeled as separate instances. The list of queries can refer to long-tail objects, or can include descriptions of object properties such as semantics, material type, and situational context.

Most existing methods in 3D scene understanding are heavily focused on understanding the scene on an object level by detecting or segmenting the 3D object instances. However, identifying 3D objects is only an intermediate step towards a more fine-grained goal. In real-world applications, agents need to successfully detect and interact with the functional interactive elements in the scene, such as knobs, handles and buttons, and reason about their purpose in the scene context. Through interacting with these elements, agents can accomplish diverse tasks, such as opening a drawer or turning on the light. In the second track of our workshop challenge, we focus on open-vocabulary 3D affordance grounding. Given a 3D scene and an open-vocabulary, text-based description of a task (e.g., "open the fridge"), the goal is to segment the functional interactive element that the agent needs to interact with (e.g., fridge handle) to successfully accomplish the task.


  • Track 1: 3D Object Instance Search (based on Natural Language Queries): VinAI-3DIS
  • Track 2: 3D Functionality Grounding (based on Natural Language Queries): simple3d

20th Perception Beyond the Visible Spectrum workshop series (IEEE PBVS)

Challenge Period: Dec 2023 - May 2024

Workshop Date: 18 June 18th 2024

5th Thermal Images Super-Resolution Challenge (TISR)

The fifth Thermal Image Super-Resolution challenge introduces a recently acquired benchmark dataset captured with cross-spectral sensors - visible (Balser camera) and thermal (TAU2 camera). It consists of two Tracks (175 teams registered in both challenges).

TISR Track 1 features a single evaluation task, requiring participants to generate an x8 super-resolution thermal image from the given low-resolution thermal images. The challenge involves utilizing a bicubic down-sampled by 8 and noiseless set of images as input.

TISR Track 1 winners:

  • 1st place, PSNR and SSIM: Chenyang Wang, Zhiwei Zhong, and Junjun Jiang.
  • 2nd place, PSNR (tie) and SSIM: Weiwei Zhou, Chengkun Ling, and Jiada Lu
  • 2nd place, PSNR (tie): Jiseok Yoon, Wonseok Jang, and Haseok Song
  • 3rd place, PSNR and SSIM: Huiwon Gwon, Hyejeong Jo, and Sunhee Jo

TISR Track 2 consists of two evaluation tasks using the newly acquired dataset. The first evaluation involves generating an x8 super-resolution thermal image, while the second evaluation requires participants to generate an x16 super-resolution thermal image. In both cases, the provided high-resolution visible image should be used as a guidance for enhancing the low-resolution thermal image. The proposed architecture in this track must use visible images as guidance.

TISR Track 2 winners:

  • 1st place, Eval 1 and Eval 2: Zhiwei Zhong, Peilin Chen, and Shiqi Wang
  • 2nd place, Eval 1 and Eval 2: Raghunath Sai Puttagunta, Zhu Li, and George York
  • 3rd place, Eval 1 and Eval 2(tie): Cyprien Arnold, and Lama Seoud
  • 3rd place, Eval 2(tie): Jin Kim, Dongyeon Kang, and Dogun Kim

Published paper: Thermal Image Super-Resolution Challenge Results - PBVS 2024, Rafael E. Rivadeneira, Angel D. Sappa, Chenyang Wang, Junjun Jiang, Zhiwei Zhong, Peilin Chen, and Shiqi Wang


Multi-modal Aerial View Imagery Challenge: Classification (MAVIC-C)

The 2024 MAVIC-C challenge aimed to advance recognition models leveraging SAR and EO imagery by integrating these modalities to improve object recognition (OR) systems. Building on previous challenges, it introduced the enhanced UNICORN dataset and a revised competition format focused on SAR classification. The challenge evaluated model robustness through out-of-distribution measures and traditional accuracy metrics. It attracted significant participation, with 146 teams registering and 50 submitting valid algorithms for rigorous assessment. The winning team, IQSKJSP, achieved the highest total score, accuracy, and AUC. This year's challenge underscored the importance of multi-modal data integration to enhance OR model performance.

MAVIC-C Winners:

  • 1st place: Weilong Guo and Jian Yang
  • 2nd place: Jingwen Huang
  • 3rd place: Xinning Li and Yuning Li

Published paper: Multi-modal Aerial View Image Challenge: SAR Classification, Spencer Low, Oliver Nina, Dylan Bowald, Angel D. Sappa, Nathan Inkawhich, and Peter Bruns


Multi-modal Aerial View Imagery Challenge: Translation (MAVIC-T)

The 2024 MAVIC-T challenge builds on the previous year's focus on Synthetic Aperture Radar (SAR) and Electro-Optical (EO) imagery by expanding to include infrared (IR) and multiple collections from various times and locations. This year's challenge aims to enhance sensor data utility through modality conversion, thereby increasing data diversity and mitigating coverage gaps. It focuses on translating data between four main tasks: RGB to IR, SAR to EO, SAR to IR, and SAR to RGB. By leveraging the unique advantages of different sensors, such as SAR's all-weather capabilities and IR's utility in thermal imaging, the challenge seeks to overcome the limitations associated with individual sensor modalities. The development of robust models for multi-modal translation can enhance applications like vision-aided navigation in GNSS-denied environments and automatic target recognition (ATR) tasks, where data from EO imagery can be used to train SAR ATRs. The introduction of the MAGIC dataset and the advancements in sensor translation methods are detailed, highlighting the challenge's role in pushing the boundaries of multi-modal image analysis. The paper is structured to provide an overview of the challenge dataset, evaluation metrics, results from different teams, and descriptions of top approaches. The 2024 MAVIC-T challenge saw participation from 95 teams, with the results of the top ten teams summarized in the paper. The top-performing methods all used pix2pixHD and LPIPS metrics as loss functions. While the performance for SAR-RGB and SAR-IR translation tasks was generally similar among the top teams, SAR-EO and RGB-IR translation tasks were the main differentiators. The winning team, NJUST-KMG, achieved the best overall score across all tasks.

MAVIC-T Winners:

  • 1st place: Xixian Wu, Dian Chao, and Yang Yang
  • 2nd place: Jun Yu, Keda Lu, Shenshen Du, Lin Xu, Peng Chang, Houde Liu, Bin Lan, and Tianyu Liu
  • 3rd place: Zhiyu Wang, Xudong Kang, and Shutao Li

Published paper: Multi-modal Aerial View Image Challenge: Sensor Domain Translation, Spencer Low, Oliver Nina, Dylan Bowald, Angel D. Sappa, Nathan Inkawhich, and Peter Bruns

4th International Workshop on Physics-Based Vision Meets Deep Learning (PBDL)

Workshop Date: 17 June 2024

The Physics Based Vision meets Deep Learning (PBDL) competition is a  3-month event that focuses on addressing the challenges of low-light image enhancement and high dynamic range imaging. This competition is organized by professor Fu Attracting over 300 participants and 100 teams, the competition has garnered more than 500 submissions from a diverse range of institutions, including major companies and top universities. The competition features eight tracks, which include Low-light Object Detection and Instance Segmentation, Low-light raw video denoising with realistic motion, Low-light SRGB image enhancement, Extreme Low-Light Image Denoising, Low-light Raw Image Enhancement, HDR Reconstruction from a Single Raw Image, Highspeed HDR Video Reconstruction from Events, and Raw Image Based Over-Exposure Correction. This challenge aims to advance the integration of physical imaging and deep learning technologies.

Awardees (track, team name, team leader, affiliation):

  • Low-light Object Detection and Instance Segmentation: equal winners 1. GroundTruth, Xiaoqiang Lu, Xidian University, 2. Xocean, Haiyang Xie, Wuhan University
  • Low-light Raw Video Denoising with Realistic Motion: ZichunWang, Zichun Wang, Beijing Institute of Technology
  • Low-light-srgb-image-enhancement: IMAGCX, Xiang Chen, Nanjing University of Science and Technology
  • Extremely Low-Light Image Denoising: jly724215288, Linyan Jiang, Tencent
  • Low-light Raw Image Enhancement: Lc, Cheng Li, Xiaomi inc., China
  • HDR reconstruction from a single raw image: Alanosu, Liwen Zhang, ZTE Corporation
  • Highspeed HDR Video Reconstruction from Events: IVISLAB, Qinglin Liu, Harbin Institute of Technology
  • Raw Image Based Over-Exposure Correction: gxj, Xuejian Gou, Xidian University

Pixel-level Video Understanding in the Wild Challenge

Challenge Period: 15 May – 25 May, 2024

Workshop Date: 17 June 2024

Pixel-level Video Understanding in the Wild Challenge (PVUW) challenge aiming at performing the challenging yet practical video understanding. In this year, we add two new tracks, Complex Video Object Segmentation Track based on MOSE and Motion Expression guided Video Segmentation track based on MeViS. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic scenarios.


  • MOSE Track: Team: PCL_VisionLab, Deshui Miao, Xin Li; Zhenyu He, Yaowei Wang, Ming-Hsuan Yang from Harbin Institute of Technology, Shenzhen, Peng Cheng Laboratory, University of California at Merced
  • MeViS Track: Team:, Mingqi Gao, Jingnan Luo, Jinyu Yang, Jungong Han, Feng Zheng,, Southern University of Science and Technology, University of Sheffield, University of Warwick

RetailVision: Overview and Amazon Deep Dive
Challenge Period: 31 March – 29 May  2024
Workshop Date: 18 June 2024

The challenge consists of two tracks. The Video Temporal Action Localization (TAL) and Spatial Temporal Action Localization (STAL) challenges aim at localizing products associated with actions of interest in video. The environment the video is captured in is a physical retail grocery store and the capture system consists of a camera mounted on a regular shopping cart looking inwards at the basket volume. The TAL challenge will consist of training a model that localizes temporally, (along the time axis) the actions of interest in each video in the dataset. STAL on the other hand will involve localizing the product associated with the action spatially and temporally in the video. The action of interest in this challenge is one of three common place actions performed by people shopping with grocery carts; Take: Putting a product into the basket, Return: Removing a product out of the basket and Rummage: Moving products around in the basket. We will evaluate the models performance based on frame-mAP (temporal localization) and tube-mAP (spatio-temporal localization).

The Multi-Modal Product Retrieval (MPR) challenge’s goal is to retrieve the product identity from an open pool of candidate products. Participants will be given a training dataset consisting of (i) images for different products, and (ii) textual description of the products, and are expected to design methods to accurately retrieve the product identity by measuring similarity between images and textual descriptions. For evaluation, a held out set of probe images and a pool of textual description of catalog products will be passed to the model. The model is expected to output a ranked list of catalog products based on the similarity between images and descriptions. We use Cumulative Matching Characteristics (CMC) to evaluate performance of each solution.


  • TAL and STAL Track
    First Place: Tencent WeChat CV;  Zhenhua Liu, Tianyi Wang, Ke Mei, and Guangting Wang
  • MPR Track
    First Place: Tencent WeChat CV;  Zhenhua Liu, Tianyi Wang, Ke Mei, and Guangting Wang
    Second Place:
    Amazon MENA-Tech; Sarthak Srivastava

Rhobin: 2nd Workshop on Reconstruction of Human-Object Interactions

Challenge Period: 5 Feb – 30 May, 2024

Workshop Date: 17 June 2024

Given the importance of human-object interaction, we propose five challenges in reconstructing 3D humans and objects and estimating 3D human-object and human-scene contact, from monocular RGB images. We have seen promising progress in reconstructing human body mesh or estimating 6DoF object pose from single images. However, most of these works focus on occlusion-free images which are not realistic for settings during close human-object interaction since humans and objects occlude each other. This makes inference more difficult and poses challenges to existing state-of-the-art methods. Similarly, methods estimating 3D contacts have also seen rapid progress, but are restricted to scanned or synthetic datasets, and struggle with generalization to in-the-wild scenarios. In this workshop, we want to examine how well the existing human and object reconstruction and contact estimation methods work under more realistic settings and more importantly, understand how they can benefit each other for accurate interaction reasoning. Specifically , we have the following five challenges: 3D human reconstruction, 6DoF pose estimation of rigid objects, Joint reconstruction of human and object, Video-based tracking, and 3D contact prediction from 2D images. 


  • Fabien Baradel, Thomas Lucas, Matthieu Armando, Romain Brégier, Salma Galaaoui, Philippe Weinzaepfel, Gregory Rogez, NAVER LABS
  • Yangxu Yan, Anlong Ming, Huadong Ma, Zhaowen Lin, Weihong Yao, Beijing University of Posts and Telecommunications and Shanghai Vision Era Co. Ltd.
  • Yuyang Jing, Anlong Ming, Yongchang Zhang, Xiaohai Yu, Huadong Ma, Weihong Yao, Beijing University of Posts and Telecommunications and Shanghai Vision Era Co. Ltd.

ScanNet++ Novel View Synthesis and 3D Semantic Understanding Challenge

Workshop Date: 18 June 2024

The ScanNet++ challenge offers the first benchmark challenge for novel view synthesis in large-scale 3D scenes, along with high-fidelity, large-vocabulary 3D semantic scene understanding. Winners have submitted leading novel view synthesis and 3D semantic understanding methods on the ScanNet++ benchmark.


  • Point Transformer V3 with Point Prompt Training; Xiaoyang Wu, The University of Hong Kong
  • Robust Point-based Graphics; Zizhuang Wei Huawei
  • Feature Splatting for Better Novel View Synthesis with Low Overlap; Tomás Berriel, University of Zaragoza

SyntaGen - Harnessing Generative Models for Synthetic Visual Datasets

Workshop date: 17 June 2024

The competition aims to drive innovation in creating high-quality synthetic datasets using pretrained Stable Diffusion and the 20 class names from PASCAL VOC 2012 for semantic segmentation. The quality of these datasets is evaluated by training a DeepLabv3 model on them and assessing its performance on a private test set, with a sample validation set from PASCAL VOC 2012. Submissions are ranked based on the mIoU metric. This framework reflects the practical use of synthetic datasets as replacements for real datasets.


  • 1st place: HNU-VPAI - Hunan University: Zhiyu Wang, Puhong Duan, Zhuojun Xie, Wang Liu, Bin Sun, Xudong Kang, Shutao Li

1st Workshop on Urban Scene Modeling

Workshop date: 17 June 2024

Challenge period: S23DR Challenge 15 Mar – 14 June 2024;

Building3D Challenge 15 February – 31 May 2024

Structured Semantic 3D Reconstruction (S23DR) Challenge - The objective of this competition is to facilitate the development of methods for transforming posed images (sometimes also called "oriented images") / SfM outputs into a structured geometric representation (wire frame) from which semantically meaningful measurements can be extracted. In short: More Structured Structure from Motion.

Building3DC Challenge – Building3DC is an urban-scale publicly available dataset consisting of more than 160 thousand buildings with corresponding point clouds, meshes, and wireframe models covering 16 cities in Estonia. For this challenge, approximately 36,000 buildings from the city of Tallinn are used as the training and testing dataset. We require algorithms to take the original point cloud as input and regress the wireframe model.


S23DR Challenge:

  • 1st place: Denys Rozumnyi
  • 2nd place: Kunal Chelani
  • 3rd place: Kuo-Chin Lien
  • Honorable Mention: Serhii Ivanov
  • Honorable Mention: Wenzhao Tang, Weihang Li 

Building3DC Challenge:

  • 1st place: Yuzhou Liu, Lingjie Zhu, Hanqiao Ye, Xiang Gao, Shuhan Shen
  • 2nd place: Hongxin Yang, Siyu Chen, YuJun Liu
  • 3rd place: Jiahao Zhang, Qi Liu, Yuchao Dai, Le Hui, Zhixiang Pei
  • 3rd place: Hongye Hou, Fangyu Du, Jiaxin Ren, Yuxuan Jiang, Xiaobin Zhai
  • Honorable Mention: Fuhai Sun, Yingliang Zhang, Qiaoqiao Hao, Ting Han Duxin Zhu, Wenjing Wu, Tianrui Bayles-Rea, Guang Gao

The 3rd CVPR Workshop on Vision Datasets Understanding

Workshop Date: 17 June 2024

Data is the fuel of computer vision, on which state-of-the-art systems are built. A robust object detection system not only needs a strong model architecture and learning algorithms but also relies on a comprehensive large-scale training set. Despite the pivotal significance of datasets, existing research in computer vision is usually algorithm centric. Comparing the number of algorithm-centric works in domain adaptation, the quantitative understanding of the domain gap is much more limited. As a result, there are currently few investigations into the representations of datasets, while in contrast, an abundance of literature concerns ways to represent images or videos, essential elements in datasets. 

The 3rd VDU workshop aims to bring together research works and discussions focusing on analyzing vision datasets, as opposed to the commonly seen algorithm-centric counterparts. 



Visual Anomaly and Novelty Detection 2.0 2024

Challenge Period: 15 April to 1 June 2024
Workshop Date: 17 June 2024

This year our challenge aims to bring visual anomaly detection closer to industrial visual inspection, which has wide real-world applications. We look forward to participants from both academia and industry. Proudly sponsored by Intel, this challenge consists of two categories:

  • Category 1 — Adapt & Detect: Robust Anomaly Detection in Real-World Applications
  • Category 2 — VLM Anomaly Challenge: Few-Shot Learning for Logical and Structural Detection

Participants can choose a category or enter both in two separate submissions. These challenge categories aim to advance existing anomaly detection literature and increase its adaptation in real-world settings. We invite the global community of innovators, researchers, and technology enthusiasts. Engage with these challenges and contribute towards advancing anomaly detection technologies in real-world scenarios.

From April 15th – June 1st, 2024, this global community can showcase their ideas on how to solve these challenges in the visual anomaly detection field.

For more information about the submission and the challenge please visit the webpage


Category 1 - First Place: Project-  ARNet for Robust Anomaly Detection, Babar Hussain, Tcl_liu

Category 1 - Second Place: Project - CanhuiTang_submission_v2, Team IAIR: Canhui Tang, Mang Cao, Sanping Zhou Zhou

Category 1 - Honorable mentions: Project- Final submission in VAND2, Amal Alsulamy, Munirah Alyahya, Nouf Alajmi. Project- Ensemble PatchCore, Yukino Tusuzuki

Category 2 - First Place: Project- Anomaly MoE, Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Jinqiao Wang

Category 2 - Second Place: Project- RJVoyagers, Nestor Bao, YJ Chen

Category 2 - Honorable mentions: Project-Locore, Xi Jiang, Hanqiu Deng. Project- MVTec LOCO Diffusion-AD, Chu Sam, Jay Liu


VizWiz Grand Challenge

Workshop Date: 18 June 2024

Our goal for this workshop is to educate researchers about the technological needs of people with vision impairments while empowering researchers to improve algorithms to meet these needs. A key component of this event will be to track progress on six dataset challenges, where the tasks are to answer visual questions, ground answers, recognize visual questions with multiple answer groundings, recognize objects in few-shot learning scenarios, locate objects in few-shot learning scenarios, and classify images in a zero-shot setting. The second key component of this event will be a discussion about current research and application issues, including invited speakers from both academia and industry who will share their experiences in building today’s state-of-the-art assistive technologies as well as designing next-generation tools.


  • Challenge 1: Answer visual questions: Shopline AI Research; Baojun Li, Jiamian Huang, Tao Liu
  • Challenge 2: Ground answers to visual questions: MGTV, China; Kang Zhang, Yi Yu, Shien Song, Haibo Lu, Jie Yang, Yangke Huang, Hongrui Jin
  • Challenge 3: Recognize visual questions with multiple answer groundings: MGTV, China; Yi Yu, Kang Zhang, Shien Song, Haibo Lu, Jie Yang, Yangke Huang, Hongrui Jin
  • Challenge 4: Recognize objects in few-shot learning scenarios: SK Ecoplan; Hyunhak Shin, Yongkeun Yun, Dohyung Kim, Jihoon Seo, Kyusam Oh
  • Challenge 5: Locate objects in few-shot learning scenarios: China Telecom Artificial Intelligence Technology (Beijing) Co. and Xi'an Jiaotong University; Rongbao Han, Zihao Guo, Jin Wang, Tianyuan Song, Hao Yang, Jinglin Yang,Hao Sun
  • Challenge 6: Classify images in a zero-shot setting: Hanbat National University; Huiwon Gwon, Sunhee Jo, Hyejeong Jo, Chanho Jung

What is next in multimodal foundation models?

Workshop Date: 18 June 2024

Challenge Period: 20 March – 20 May 2024

Multimodal Foundation Models (MMFMs) have shown unprecedented performance in many computer vision tasks. However, on some very specific tasks like document understanding, their performance is still underwhelming. In order to evaluate and improve these strong multi-modal models for the task of document image understanding, we harness a large amount of publicly available and privately gathered data (listed in the image above) and propose a challenge. In the following, we list all the important details related to the challenge. Our challenge is running in two separate phases. 


  • First place: Nanbeige-VL: Rui Xiao, James Wu, Emily Zhou, Thomas An, Michael Du, David Li, Daniel Cheng - Report
  • Second place: Necla: Vijay Kumar B G 
  • Third place: Leloy: Franz Louis Cesista