CVPR 2024 Workshop Challenge Awardees
Challenge and Competition Winners
6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW) Challenge period: 13 Jan – 19 March 2024 The Competition is a continuation of the ABAW Competition held in CVPR 2023 & 2022, in ECCV 2022, in ICCV 2021, in IEEE FG in 2020 and in CVPR 2017. It includes the five below mentioned Challenges:
Awardees:
|
The 4th Workshop of Adversarial Machine Learning on Computer Vision: Robustness of Foundation Models Challenge Period: 15 March - 20 May 2024 Workshop Date: 17 June 2024 As we observe the rapid evolution of large foundation models, their incorporation into the automotive landscape gains prominence, owing to their advanced capabilities. However, this progress is not without its challenges, and one such formidable obstacle is the vulnerability of these large foundation models to adversarial attacks. Adversaries could employ techniques such as adversarial textures and camouflage, along with other sophisticated strategies, to manipulate the perception, prediction, planning, and control mechanisms of the vehicle. These actions pose risks to the reliability and safety of autonomous driving systems. To advance the research on building robust models in autonomous driving systems (ADS), we organize this challenging track for motivating novel adversarial attack algorithms against a spectrum of vision tasks under the realm of ADS, spanning image recognition, object detection, and semantic segmentation. Participants are encouraged to develop attack methods that generate subtle yet harmful images to expose vulnerability of large foundation models utilized in the ADS. (The challenge website) Awardees: Challenge: Black-box Adversarial Attacks on Vision Foundation Models
Distinguished Paper Award
|
Agriculture-Vision Challenges and Opportunities for Computer Vision in Agriculture Challenge Period: 1 Feb – 3 June 2024 Workshop Date: 18 June 2024 The core of this challenge is to strategically use both the extended, unlabeled agriculture vision dataset and the labeled original agriculture vision dataset. The aim is to enhance model performance by effectively applying semi-supervised learning techniques. Participants are challenged to integrate the depth and variety of the unlabeled dataset with the structured framework of the labeled dataset to achieve superior results. Awardees:
|
SPARK Challenge @ AI4SPACE Workshop Challenge Period: 15 January to 28 March 2024 The 3rd edition of SPARK (SPAcecraft Recognition leveraging Knowledge of Space Environment) organized as part of the AI4Space workshop, in conjunction with the IEEE / CVF CVPR 2024, focuses on designing data-driven approaches for spacecraft component segmentation and trajectory estimation. SPARK utilizes data synthetically simulated with a game engine alongside data collected from the Zero-G lab at the University of Luxembourg. The SPARK challenge was organized by the Computer Vision, Imaging & Machine Intelligence Research Group (CVI²) at the Interdisciplinary Centre for Security, Reliability and Trust (SnT) of the University of Luxembourg (UL). The challenge comprises two streams:
Awardees:
|
Challenge Period: 22 Jan – 25 March, 2024 Workshop Date: 17 June 2024 The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS). The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Awardees:
Winner: Shanghai Jiao Tong University + AI Lab, Lenovo Research + Monash University; Zhenyu Xie, Zelin Ni, Wenjie Yang, Yuang Zhang, Yihang Chen, Yang Zhang, Xiao Ma Runner-Up: Yachiyo Engineering Co., Ltd. + Chubu University; Ryuto Yoshida, Junichi Okubo, Junichiro Fujii, Masazumi Amakata, Takayoshi Yamashita
Winner: Alibaba Group; Zhizhao Duan, Hao Cheng, Duo Xu, Xi Wu, Xiangxie Zhang, Xi Ye, Zhen Xie Runner-Up: Ho Chi Minh City University of Technology, VNU-HCM, Vietnam + University of Science, VNU-HCM, Vietnam + University of Information Technology, VNU-HCM, Vietnam + FPT Telecom, Vietnam + AI Lab- AI VIETNAM + Vietnam National University Ho Chi Minh City; Khai Trinh Xuan, Khoi Nguyen Nguyen, Bach Hoang Ngo, Vu Dinh Xuan, Minh-Hung An, Quang-Vinh Dinh
Winner: China Telecom Artificial Intelligence Technology (Beijing) Co., Ltd.; Tiantian Zhang, Qingtian Wang, Xiaodong Dong, Wenqing Yu, Hao Sun, Xuyang Zhou, Aigong Zhen, Shun Cui, Dong Wu, Zhongjiang He Runner-Up: Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea; Huy-Hung Nguyen, Chi Dai Tran, Long Hoang Pham, Duong Nguyen-Ngoc Tran, Tai Huu-Phuong Tran, Duong Khac Vu, Quoc Pham-Nam Ho, Ngoc Doan-Minh Huynh, Hyung-Min Jeon, Hyung-Joon Jeon, Jae Wook Jeon
Winner: VNPTAI, VNPTGroup, Hanoi, Vietnam + Faculty of Computer Science, Phenikaa University, Hanoi, Vietnam + University of Transport and Communications, Hanoi, Vietnam; Viet Hung Duong, Duc Quyen Nguyen, Thien Van Luong, Huan Vu, Tien Cuong Nguyen Runner-Up: Nota Inc., Republic of Korea; Wooksu Shin, Donghyuk Choi, Hancheol Park, Jeongho Kim Honorable Mention: Department of Electrical and Computer Engineering Sungkyunkwan University; Long Hoang Pham, Quoc Pham-Nam Ho, Duong Nguyen-Ngoc Tran, Tai Huu-Phuong Tran, Huy-Hung Nguyen, Duong Khac Vu, Chi Dai Tran, Ngoc Doan-Minh Huynh, Hyung-Min Jeon, Hyung-Joon Jeon, Jae Wook Jeon
Winner: University of Information Technology, VNU-HCM, Vietnam + Vietnam National University, Ho Chi Minh City, Vietnam; Hao Vo, Sieu Tran, Duc Minh Nguyen, Thua Nguyen, Tien Do, Duy-Dinh Le, Thanh Duc Ngo Runner-Up: China Mobile Shanghai ICT Co.,Ltd; Yunliang Chen, Wei Zhou, Zicen Zhou, Bing Ma, Chen Wang, Yingda Shang, An Guo, Tianshu Chu |
Workshop Date: 17 June 2024 The AI for Content Creation (AI4CC) workshop at CVPR brings together researchers in computer vision, machine learning, and AI. Content creation is required for simulation and training data generation, media like photography and videography, virtual reality and gaming, art and design, and documents and advertising (to name just a few application domains). Recent progress in machine learning, deep learning, and AI techniques has allowed us to turn hours of manual, painstaking content creation work into minutes or seconds of automated or interactive work. For instance, generative adversarial networks (GANs) can produce photorealistic images of 2D and 3D items such as humans, landscapes, interior scenes, virtual environments, or even industrial designs. Neural networks can super-resolve and super-slomo videos, interpolate between photos with intermediate novel views and even extrapolate, and transfer styles to convincingly render and reinterpret content. In addition to creating awe-inspiring artistic images, these offer unique opportunities for generating additional and more diverse training data. Learned priors can also be combined with explicit appearance and geometric constraints, perceptual understanding, or even functional and semantic constraints of objects. Awardees:
|
AIS: Vision, Graphics and AI for Streaming Challenge Period: 5 Feb - 29 March, 2024 Workshop Date: 17 June 2024 Welcome to the 1st Workshop on AI for Streaming at CVPR! This workshop focuses on unifying new streaming technologies, computer graphics, and computer vision, from the modern deep learning point of view. Streaming is a huge industry where hundreds of millions of users demand everyday high-quality content on different platforms. Computer vision and deep learning have emerged as revolutionary forces for rendering content, image and video compression, enhancement, and quality assessment. From neural codecs for efficient compression to deep learning-based video enhancement and quality assessment, these advanced techniques are setting new standards for streaming quality and efficiency. Moreover, novel neural representations also pose new challenges and opportunities in rendering streamable content, and allowing to redefine computer graphics pipelines and visual content. Check out our website for more information: https://ai4streaming-workshop.github.io/ This Workshop consisted of 5 different challenges:
Awardees:
|
Workshop on Autonomous Driving (WAD) Challenge Date: 22 May 2024 Workshop Date: 17 June 2024 The 5th Waymo Open Dataset Challenge edition covered a range of fundamental research topics in the autonomous driving domain and included:
Awardees:
|
AVA: Accessibility, Vision, and Autonomy Meet Challenge Workshop Date: 18 June 2024 Computer vision and machine learning benchmarks for understanding humans generally lack in diversity, rarely incorporating individuals with mobility aids. To address this gap, our challenge leverages a large synthetic benchmark for accessibility-oriented computer vision tasks, particularly in the context of embodied navigation and autonomous systems. The benchmark is quite challenging, spanning different camera perspectives, ambient settings (weathers, towns), dense social navigation settings, and diverse pedestrian mobility aids. There are two main tasks: instance segmentation and keypoint detection. The evaluation follows COCO AP evaluation but with introduced fine-grained categories, e.g., classes such as `cane’ and `wheelchair’. Such classes confound state-of-the-art vision models, for instance, pedestrians with a cane or in wheelchairs tend to result in degraded performances. The challenge was hosted on Eval.AI (both for segmentation and keypoints). Awardees: Challenge 1: Instance Segmentation Challenge
Challenge 2: Pose Estimation Challenge
|
Workshop date: 17 June 2024 With the development and successes of computer vision algorithms, application to real-world environments become increasingly feasible for these methodologies. However, real-world applications may involve degradations which are not often included in standard training and testing datasets: poor illumination, adverse weather conditions, aberrated optics, or complicated sensor noise. In addition to this, non-standard imaging solutions such as those for flexible wearables or augmented reality headsets may require unconventional processing algorithms which complicates their path to real-world application. What is the gap in performance for the current state of the art when placed into harsh, real-world environments? The central theme of the workshop in this proposal is to invite researchers to investigate this question and push the state-of-the-art forward for real-world application of recognition tasks in challenging environments. Continuing the history of success at CVPR 2018–2023, we provide its 7th version for CVPR 2024. It will inherit the successful benchmark dataset, platform and other evaluation tools used in previous UG2+ workshops, as well as broadening its scope with new tracks and applications within the context of real-world application of computer vision algorithms. Awardees: Ug2 Challenge winners: Challenge 1: Atmospheric turbulence mitigation winners
Challenge 2: All weather semantic segmentation
Challenge 3: UAV tracking and pose estimation
|
5th Chalearn Face Anti-spoofing Workshop and Challenge Challenge Period: 1 February– 14 April 2024 In recent years the security of face recognition systems has been increasingly threatened. Face Anti-spoofing (FAS) is essential to secure face recognition systems primarily from various attacks. In order to attract researchers and push forward the state of the art in Face Presentation Attack Detection (PAD), we organized four editions of Face Anti-spoofing Workshop and Competition at CVPR 2019, CVPR 2020, ICCV 2021, and CVPR 2023, which together have attracted more than 1200 teams from academia and industry, and greatly promoted the algorithms to overcome many challenging problems. In addition to physical presentation attacks (PAs), such as printing, replay, and 3D mask attacks, digital face forgery attacks (FAs) are still a threat that seriously endangers the security of face recognition systems. FAs aim to attack faces using digital editing at the pixel level, such as identity transformation, facial expression transformation, attribute editing, and facial synthesis. At present, detection algorithms for these two types of attacks, ``Face Anti-spoofing (FAS)" and ``Deep Fake/Forgery Detection (DeepFake)", are still being studied as independent computer vision tasks, and cannot achieve the functionality of a unified detection model to respond to both types of attacks simultaneously. To give continuity to our efforts in these relevant problems, we are proposing the 5th Face Anti-Spoofing Workshop@CVPR 2024. We analyze different types of attack clues as the main reason for the incompatibility between these two detection. The spoofing clues based on physical presentation attacks are usually caused by color distortion, screen moire patterns, and production traces. In contrast, the forgery clues based on digital editing attacks are usually changes in pixel values. The fifth competition aims to encourage the exploration of common characteristics in these two types of attack clues and promote the research of unified detection algorithms. Fully considering the above difficulties and challenges, we collect a Unified physical-digital Attack dataset, namely UniAttackData, for this fifth edition for algorithm design and competition promotion, including 1,800 participations with 2 and 12 physical and digital attacks, respectively, with a total of 28,706 videos. Awardees: Track1: Unified Physical-Digital Face Attack Detection
Track 2: Snapshot Spectral Imaging Face Anti-spoofing
|
4th Workshop on Computer Vision in the Built Environment The 4th International Scan-to-BIM competition targets the development of computer vision methods that automatically generate the semantic as-is status of buildings given their 3D point clouds. Specifically, the challenge includes two tracks: (i) 2D Floor Plan Reconstruction: given a 3D point cloud as input, this track aims at automatically reconstructing the 2D vectorized floorplan of the captured area; and (ii) 3D Building Model Reconstruction; given a 3D point cloud as input, this track aims at reconstructing the 3D parametric semantic model of the captured area. The evaluation is based on geometric, semantic, and topological metrics. Awardees:
|
5th Workshop on Continual Learning in Computer Vision This challenge aims to explore techniques that combine two fundamental aspects of data efficiency: continual learning and unlabelled data usage. In real-world machine learning applications, it is fair to assume that data is not cheap to acquire, store and label. Therefore, it is crucial to develop strategies that are flexible enough to learn from streams of experiences, without forgetting what has been learned previously. Additionally, contextual unlabelled data can often be provided for a fraction of the cost of labeled data and could be exploited to integrate additional information into the model. Awardees:
|
Data Curation and Augmentation in Medical Imaging (DCAMI) Workshop Award Period: 31 January to 22 March 2024 Data-driven computer vision and AI solutions for medical imaging represent a great potential to make a real-life impact by improving patient care. However, safety requirements associated with healthcare pose major challenges for this research field, especially regarding data curation. Collection and annotation of medical data is often resource-intensive due to the need for medical expertise. At the same time, data quality is of the highest importance to ensure safe and fair usage in clinical settings. As a result, efficient data curation and validation, learning from small data as well as data synthesis are important areas of research.
Awardees:
|
Domain Adaptation, Explainability and Fairness in AI for Medical Image Analysis (DEF-AI-MIA) Workshop The DEF-AI-MIA COV19D Competition has been the 4th in the series of COV19D Competitions following the first 3 Competitions held in the framework of ICCV 2021, ECCV 2022 and ICASSP 2023 Conferences respectively. It included two Challenges: i) Covid-19 Detection Challenge and ii) Covid-19 Domain Adaptation Challenge. Both Challenges were based on the COV19-CT-DB database, including chest CT scan series. The 1st Challenge aimed to build effective models for fast and reliable Covid-19 diagnosis based on 3-D chest CT scans, whilst the 2nd Challenge aimed to use Deep Learning Domain Adaptation methodologies to diagnose Covid-19 across annotated datasets from medical sources and non-annotated datasets from other medical sources. Awardees:
|
The Fifth Annual Embodied Artificial Intelligence Workshop Challenge Period: March-June 2024 Workshop Date: 18 June 2024 Minds live in bodies, and bodies move through a changing world. The goal of embodied artificial intelligence is to create agents, such as robots, which learn to creatively solve challenging tasks requiring interaction with the environment. Fantastic advances in deep learning and the increasing availability of large datasets like ImageNet have enabled superhuman performance on a variety of AI tasks previously thought intractable. These advances have supercharged embodied AI, enabling a growing collection of researchers to make rapid progress towards intelligent agents which can:
The goal of the Embodied AI Workshop is to bring together researchers from computer vision, language, graphics, and robotics to share and discuss the latest advances in embodied intelligent agents. The 2024 workshop hosted six exciting challenges covering a wide range of topics such as mobile manipulation, visual/tactile skill learning, and visual, language and social navigation. Our winners include:
For more information on these challenges and their winners, check out their websites above, or check out the Embodied AI Workshop website https://embodied-ai.org/ for present and past challenges. |
First Joint Egocentric Vision (EgoVis) Workshop Workshop date: 17 June 2024 The first edition of the EgoVis workshop has seen the participation of researchers from diverse backgrounds, geographical regions and institutions, both academic and industrial. Aiming to be the focal point for the egocentric computer vision community to meet and discuss progress in this fast growing research area, the workshop addresses egocentric vision in a comprehensive manner including keynote talks, research presentations, an awards ceremony, as well as, key research challenges in video understanding across several different benchmark datasets covering, multi-modal data, interaction learning, self-supervised learning, AR/VR with applications to cognitive science and robotics. This year, the workshop offered 31 different challenges across six egocentric benchmark datasets, namely, HoloAssist, Aria Digital Twin, Aria Synthetic Environments, Ego4D, Ego-Exo4D, and EPIC-KITCHENS. Winning teams shared their insights either through invited talks or poster presentations. Awardees:
|
Foundation Models for Autonomous Systems The field of autonomy is rapidly evolving, and recent advancements from the machine learning community, such as large language models (LLM) and world models, bring great potential. We believe the future lies in explainable, end-to-end models that understand the world and generalize to unvisited environments. In light of this, we proposed seven new challenges that push the boundary of existing perception, prediction, and planning pipelines to include: end-to-end driving at scale; predictive world model; occupancy and flow; multi-view 3D visual grounding; driving with langage; mapless driving; and autonomous driving. Awardees:
|
Foundation Models for Medical Vision Workshop Date: 17 June 2024 The Segment Anything In Medical Images On Laptop Challenge seeks universal promptable medical image segmentation foundation models that are deployable on laptops or other edge devices without reliance on GPUs. Awardees:
|
Human Motion Generation (HuMoGen) Motion is one of the fundamental attributes of human (and animal) life and underlies our actions, gestures as well as our behavior. The capture and synthesis of human motion have been among the core areas of interest for the CVPR community and facilitate a variety of applications such as avatar creation, 3D character animations, AR/VR, crowd simulation, sports analytics and many more.
|
Image Matching Challenge 2024 (Kaggle) Challenge Period: 25 March 25 – 3 June 2024 Workshop Date: The goal of this workshop was to construct precise 3D maps using sets of images in diverse scenarios and environments, from world heritage sites to night-time images or transparent objects. Awardees:
|
9th New Trends in Image Restoration and Enhancement (NTIRE) Workshop and Challenges Challenge Period: 30 Jan - 22 March, 2024 Workshop Date: 17 June 2024 Image restoration, enhancement and manipulation are key computer vision tasks, aiming at the restoration of degraded image content, the filling in of missing information, or the needed transformation and/or manipulation to achieve a desired target (with respect to perceptual quality, contents, or performance of apps working on such images). There is an ever-growing range of applications in fields such as surveillance, the automotive industry, electronics, remote sensing, or medical image analysis, etc. The emergence and ubiquitous use of mobile and wearable devices offer another fertile ground for additional applications and faster methods. This workshop provides an overview of the new trends and advances in those areas. Moreover, it will offer an opportunity for academic and industrial attendees to interact and explore collaborations. This Workshop had 17 different associated challenges: · Dense and Non-Homogeneous Dehazing · Blind Enhancement of Compressed Image · Shadow Removal - Track 1 Fidelity · Light Field Image Super-Resolution - Track 1 Fidelity · Light Field Image Super-Resolution - Track 2 Efficiency · Stereo Image Super-Resolution - Track 1 Bicubic · Stereo Image Super-Resolution - Track 2 Realistic · HR Depth from Images of Specular and Transparent Surfaces - Track 1 Stereo · HR Depth from Images of Specular and Transparent Surfaces - Track 2 Mono · Bracketing Image Restoration and Enhancement - Track 2 · Quality Assessment for AI-Generated Content - Track 1 Image · Quality Assessment for AI-Generated Content - Track 2 Video · Restore Any Image Model (RAIM) in the Wild · Short-form UGC Video Quality Assessment · RAW Burst Alignment and ISP Challenge Awardees:
|
Challenge Period: 25 March – 4 June 2024 Workshop Date: 17 June 2024 3D Food Reconstruction is an innovative venture into the intersection of computer vision and culinary arts, with the goal of reconstructing three-dimensional models of food items from two-dimensional images. This challenge is designed to push the boundaries of 3D reconstruction technology by applying it to a dynamic and uncontrolled setting, such as capturing food images during eating occasions. By reconstructing 3D models with the correct size, this technology can play a vital role in tracking nutritional intake and helping individuals maintain a healthy diet. This endeavor not only aims to enhance sharing food experiences in three dimensions but also has significant potential applications in the fields of nutrition and health monitoring. Awardees:
|
Mobile Intelligent Photography and Imaging Developing and integrating advanced image sensors with novel algorithms in camera systems is prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Together with the workshop, we organize a few exciting challenges and invite renowned researchers from both industry and academia to share their insights and recent work. Our challenge includes three tracks: Few-shot Raw Image Denoising, Demosaic for HybridEVS Camera, and Nighttime Flare Removal. Awardees:
Second Place: Team: BigGuy; Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang from Tianjin University, Shenzhen MicroBT Electronics Technology Co. Ltd, China, Shanghai Artificial Intelligence Laboratory, China. Third Place: Team: SFNet-FR; Florin-Alexandru Vasluianu, Zongwei Wu, George Ciubotariu, Radu Timofte from Computer Vision Lab, CAIDAS & IFI, University of Wurzburg, Germany
Second Place: Team: Samsung MX(Mobile eXperience) Business & Samsung Research China - Beijing (SRC-B); Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan, Zikun Liu, Wenhan Luo from Samsung Research China - Beijing (SRC-B), Department of Camera Innovation Group, Samsung Electronics, Sun Yat-sen University Third Place: Team: AIIA; Mingde Qiao, Junjun Jiang, Kui Jiang, Yao Xiao, Chuyang Sun, Jinhui Hu and Weijian Ruan from Harbin Institute of Technology, Smart City Research Institute of China, Electronics Technology Group Corporation
Second Place: Team: lolers; Jun Cao, Cheng Li, Shu Chen, Liang Ma from Xiaomi Inc., China Third Place: Team: Lumos Demosaicker; Shiyang Zhou, Haijin Zeng, Kai Feng, Yongyong Chen, Jingyong Su from Harbin Institute of Technology (Shenzhen), IMEC-UGent, Northwestern Polytechnical University |
Multimodal Algorithmic Reasoning Workshop and SMART-101 Challenge Challenge period: 28 March – 8 June 2024 Workshop Date: 17 June 2024 A focus of the Multimodal Algorithmic Reasoning (MAR) workshop is to nudge the vision community to make progress on building neural networks that have human-like intelligence abilities for abstraction, inference, and generalization. To this end, we conducted the SMART-101 challenge as part of the workshop. This challenge is based on the Simple Multimodal Algorithmic Reasoning Task (SMART) and the SMART-101 dataset consisting of vision-and-language algorithmic reasoning puzzles, which are designed specifically for children in the 6–8 age group. The puzzles demonstrate the need for algorithmic reasoning on abstract visual puzzles, which we believe would be a useful test bed for evaluating the core knowledge competencies of multimodal large language models – a topic of great interest and excitement in computer vision currently. The solution to each puzzle in the challenge needs a mix of various basic mathematical, compositional, and algorithmic reasoning skills, including the knowledge to use basic arithmetic, algebra, spatial reasoning, logical reasoning, path tracing, and pattern matching. The challenge was based on a private test set and the evaluation used the Eval.AI platform. Awardees: ● Winner: Jinwoo Ahn, Junhyeok Park, Min-Jun Kim, Kang-Hyeon Kim, So-Yeong Sohn,Yun-Ji Lee, Du-Seong Chang, Yu-Jung Heo, and Eun-Sol Kim, Hanyang University, South Korea ● Runner-up: Zijian Zhang and Wei Liu, Harbin Institute of Technology, China |
OpenSun3D 2nd Workshop on Open-Vocabulary 3D Scene Understanding Challenge Period: 17 April - 14 June 2024 Workshop date: 18 June 2024 The ability to perceive, understand and interact with arbitrary 3D environments is a long-standing research goal with applications in AR/VR, robotics, health and industry. Many 3D scene understanding methods are largely limited to recognizing a closed-set of pre-defined object classes. In the first track of our workshop challenge, we focus on open-vocabulary 3D object instance search. Given a 3D scene and an open-vocabulary, text-based query, the goal is to localize and densely segment all object instances that fit best with the specified query. If there are multiple objects that fit the given prompt, each of these objects should be segmented, and labeled as separate instances. The list of queries can refer to long-tail objects, or can include descriptions of object properties such as semantics, material type, and situational context. Most existing methods in 3D scene understanding are heavily focused on understanding the scene on an object level by detecting or segmenting the 3D object instances. However, identifying 3D objects is only an intermediate step towards a more fine-grained goal. In real-world applications, agents need to successfully detect and interact with the functional interactive elements in the scene, such as knobs, handles and buttons, and reason about their purpose in the scene context. Through interacting with these elements, agents can accomplish diverse tasks, such as opening a drawer or turning on the light. In the second track of our workshop challenge, we focus on open-vocabulary 3D affordance grounding. Given a 3D scene and an open-vocabulary, text-based description of a task (e.g., "open the fridge"), the goal is to segment the functional interactive element that the agent needs to interact with (e.g., fridge handle) to successfully accomplish the task. Awardees:
|
20th Perception Beyond the Visible Spectrum workshop series (IEEE PBVS) Challenge Period: Dec 2023 - May 2024 Workshop Date: 18 June 18th 2024 5th Thermal Images Super-Resolution Challenge (TISR) The fifth Thermal Image Super-Resolution challenge introduces a recently acquired benchmark dataset captured with cross-spectral sensors - visible (Balser camera) and thermal (TAU2 camera). It consists of two Tracks (175 teams registered in both challenges). TISR Track 1 features a single evaluation task, requiring participants to generate an x8 super-resolution thermal image from the given low-resolution thermal images. The challenge involves utilizing a bicubic down-sampled by 8 and noiseless set of images as input. TISR Track 1 winners:
TISR Track 2 consists of two evaluation tasks using the newly acquired dataset. The first evaluation involves generating an x8 super-resolution thermal image, while the second evaluation requires participants to generate an x16 super-resolution thermal image. In both cases, the provided high-resolution visible image should be used as a guidance for enhancing the low-resolution thermal image. The proposed architecture in this track must use visible images as guidance. TISR Track 2 winners:
Published paper: Thermal Image Super-Resolution Challenge Results - PBVS 2024, Rafael E. Rivadeneira, Angel D. Sappa, Chenyang Wang, Junjun Jiang, Zhiwei Zhong, Peilin Chen, and Shiqi Wang _____________________ Multi-modal Aerial View Imagery Challenge: Classification (MAVIC-C) The 2024 MAVIC-C challenge aimed to advance recognition models leveraging SAR and EO imagery by integrating these modalities to improve object recognition (OR) systems. Building on previous challenges, it introduced the enhanced UNICORN dataset and a revised competition format focused on SAR classification. The challenge evaluated model robustness through out-of-distribution measures and traditional accuracy metrics. It attracted significant participation, with 146 teams registering and 50 submitting valid algorithms for rigorous assessment. The winning team, IQSKJSP, achieved the highest total score, accuracy, and AUC. This year's challenge underscored the importance of multi-modal data integration to enhance OR model performance. MAVIC-C Winners:
Published paper: Multi-modal Aerial View Image Challenge: SAR Classification, Spencer Low, Oliver Nina, Dylan Bowald, Angel D. Sappa, Nathan Inkawhich, and Peter Bruns _____________________ Multi-modal Aerial View Imagery Challenge: Translation (MAVIC-T) The 2024 MAVIC-T challenge builds on the previous year's focus on Synthetic Aperture Radar (SAR) and Electro-Optical (EO) imagery by expanding to include infrared (IR) and multiple collections from various times and locations. This year's challenge aims to enhance sensor data utility through modality conversion, thereby increasing data diversity and mitigating coverage gaps. It focuses on translating data between four main tasks: RGB to IR, SAR to EO, SAR to IR, and SAR to RGB. By leveraging the unique advantages of different sensors, such as SAR's all-weather capabilities and IR's utility in thermal imaging, the challenge seeks to overcome the limitations associated with individual sensor modalities. The development of robust models for multi-modal translation can enhance applications like vision-aided navigation in GNSS-denied environments and automatic target recognition (ATR) tasks, where data from EO imagery can be used to train SAR ATRs. The introduction of the MAGIC dataset and the advancements in sensor translation methods are detailed, highlighting the challenge's role in pushing the boundaries of multi-modal image analysis. The paper is structured to provide an overview of the challenge dataset, evaluation metrics, results from different teams, and descriptions of top approaches. The 2024 MAVIC-T challenge saw participation from 95 teams, with the results of the top ten teams summarized in the paper. The top-performing methods all used pix2pixHD and LPIPS metrics as loss functions. While the performance for SAR-RGB and SAR-IR translation tasks was generally similar among the top teams, SAR-EO and RGB-IR translation tasks were the main differentiators. The winning team, NJUST-KMG, achieved the best overall score across all tasks. MAVIC-T Winners:
Published paper: Multi-modal Aerial View Image Challenge: Sensor Domain Translation, Spencer Low, Oliver Nina, Dylan Bowald, Angel D. Sappa, Nathan Inkawhich, and Peter Bruns |
4th International Workshop on Physics-Based Vision Meets Deep Learning (PBDL) Workshop Date: 17 June 2024 The Physics Based Vision meets Deep Learning (PBDL) competition is a 3-month event that focuses on addressing the challenges of low-light image enhancement and high dynamic range imaging. This competition is organized by professor Fu et.al. Attracting over 300 participants and 100 teams, the competition has garnered more than 500 submissions from a diverse range of institutions, including major companies and top universities. The competition features eight tracks, which include Low-light Object Detection and Instance Segmentation, Low-light raw video denoising with realistic motion, Low-light SRGB image enhancement, Extreme Low-Light Image Denoising, Low-light Raw Image Enhancement, HDR Reconstruction from a Single Raw Image, Highspeed HDR Video Reconstruction from Events, and Raw Image Based Over-Exposure Correction. This challenge aims to advance the integration of physical imaging and deep learning technologies. Awardees (track, team name, team leader, affiliation):
|
Pixel-level Video Understanding in the Wild Challenge Challenge Period: 15 May – 25 May, 2024 Workshop Date: 17 June 2024 Pixel-level Video Understanding in the Wild Challenge (PVUW) challenge aiming at performing the challenging yet practical video understanding. In this year, we add two new tracks, Complex Video Object Segmentation Track based on MOSE and Motion Expression guided Video Segmentation track based on MeViS. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic scenarios. Awardees:
|
RetailVision: Overview and Amazon Deep Dive The challenge consists of two tracks. The Video Temporal Action Localization (TAL) and Spatial Temporal Action Localization (STAL) challenges aim at localizing products associated with actions of interest in video. The environment the video is captured in is a physical retail grocery store and the capture system consists of a camera mounted on a regular shopping cart looking inwards at the basket volume. The TAL challenge will consist of training a model that localizes temporally, (along the time axis) the actions of interest in each video in the dataset. STAL on the other hand will involve localizing the product associated with the action spatially and temporally in the video. The action of interest in this challenge is one of three common place actions performed by people shopping with grocery carts; Take: Putting a product into the basket, Return: Removing a product out of the basket and Rummage: Moving products around in the basket. We will evaluate the models performance based on frame-mAP (temporal localization) and tube-mAP (spatio-temporal localization). The Multi-Modal Product Retrieval (MPR) challenge’s goal is to retrieve the product identity from an open pool of candidate products. Participants will be given a training dataset consisting of (i) images for different products, and (ii) textual description of the products, and are expected to design methods to accurately retrieve the product identity by measuring similarity between images and textual descriptions. For evaluation, a held out set of probe images and a pool of textual description of catalog products will be passed to the model. The model is expected to output a ranked list of catalog products based on the similarity between images and descriptions. We use Cumulative Matching Characteristics (CMC) to evaluate performance of each solution. Awardees:
|
Rhobin: 2nd Workshop on Reconstruction of Human-Object Interactions Challenge Period: 5 Feb – 30 May, 2024 Workshop Date: 17 June 2024 Given the importance of human-object interaction, we propose five challenges in reconstructing 3D humans and objects and estimating 3D human-object and human-scene contact, from monocular RGB images. We have seen promising progress in reconstructing human body mesh or estimating 6DoF object pose from single images. However, most of these works focus on occlusion-free images which are not realistic for settings during close human-object interaction since humans and objects occlude each other. This makes inference more difficult and poses challenges to existing state-of-the-art methods. Similarly, methods estimating 3D contacts have also seen rapid progress, but are restricted to scanned or synthetic datasets, and struggle with generalization to in-the-wild scenarios. In this workshop, we want to examine how well the existing human and object reconstruction and contact estimation methods work under more realistic settings and more importantly, understand how they can benefit each other for accurate interaction reasoning. Specifically , we have the following five challenges: 3D human reconstruction, 6DoF pose estimation of rigid objects, Joint reconstruction of human and object, Video-based tracking, and 3D contact prediction from 2D images. Awardees:
|
ScanNet++ Novel View Synthesis and 3D Semantic Understanding Challenge Workshop Date: 18 June 2024 The ScanNet++ challenge offers the first benchmark challenge for novel view synthesis in large-scale 3D scenes, along with high-fidelity, large-vocabulary 3D semantic scene understanding. Winners have submitted leading novel view synthesis and 3D semantic understanding methods on the ScanNet++ benchmark. Awardees:
|
SyntaGen - Harnessing Generative Models for Synthetic Visual Datasets Workshop date: 17 June 2024 The competition aims to drive innovation in creating high-quality synthetic datasets using pretrained Stable Diffusion and the 20 class names from PASCAL VOC 2012 for semantic segmentation. The quality of these datasets is evaluated by training a DeepLabv3 model on them and assessing its performance on a private test set, with a sample validation set from PASCAL VOC 2012. Submissions are ranked based on the mIoU metric. This framework reflects the practical use of synthetic datasets as replacements for real datasets. Awardees:
|
1st Workshop on Urban Scene Modeling Workshop date: 17 June 2024 Challenge period: S23DR Challenge 15 Mar – 14 June 2024; Building3D Challenge 15 February – 31 May 2024 Structured Semantic 3D Reconstruction (S23DR) Challenge - The objective of this competition is to facilitate the development of methods for transforming posed images (sometimes also called "oriented images") / SfM outputs into a structured geometric representation (wire frame) from which semantically meaningful measurements can be extracted. In short: More Structured Structure from Motion. Building3DC Challenge – Building3DC is an urban-scale publicly available dataset consisting of more than 160 thousand buildings with corresponding point clouds, meshes, and wireframe models covering 16 cities in Estonia. For this challenge, approximately 36,000 buildings from the city of Tallinn are used as the training and testing dataset. We require algorithms to take the original point cloud as input and regress the wireframe model. Awardees: S23DR Challenge:
Building3DC Challenge:
|
The 3rd CVPR Workshop on Vision Datasets Understanding Workshop Date: 17 June 2024 Data is the fuel of computer vision, on which state-of-the-art systems are built. A robust object detection system not only needs a strong model architecture and learning algorithms but also relies on a comprehensive large-scale training set. Despite the pivotal significance of datasets, existing research in computer vision is usually algorithm centric. Comparing the number of algorithm-centric works in domain adaptation, the quantitative understanding of the domain gap is much more limited. As a result, there are currently few investigations into the representations of datasets, while in contrast, an abundance of literature concerns ways to represent images or videos, essential elements in datasets. The 3rd VDU workshop aims to bring together research works and discussions focusing on analyzing vision datasets, as opposed to the commonly seen algorithm-centric counterparts. Awardees:
|
Visual Anomaly and Novelty Detection 2.0 2024 Challenge Period: 15 April to 1 June 2024 This year our challenge aims to bring visual anomaly detection closer to industrial visual inspection, which has wide real-world applications. We look forward to participants from both academia and industry. Proudly sponsored by Intel, this challenge consists of two categories:
Participants can choose a category or enter both in two separate submissions. These challenge categories aim to advance existing anomaly detection literature and increase its adaptation in real-world settings. We invite the global community of innovators, researchers, and technology enthusiasts. Engage with these challenges and contribute towards advancing anomaly detection technologies in real-world scenarios. From April 15th – June 1st, 2024, this global community can showcase their ideas on how to solve these challenges in the visual anomaly detection field. For more information about the submission and the challenge please visit the Hackster.io webpage https://www.hackster.io/contests/openvino2024. Awardees: Category 1 - First Place: Project- ARNet for Robust Anomaly Detection, Babar Hussain, Tcl_liu Category 1 - Second Place: Project - CanhuiTang_submission_v2, Team IAIR: Canhui Tang, Mang Cao, Sanping Zhou Zhou Category 1 - Honorable mentions: Project- Final submission in VAND2, Amal Alsulamy, Munirah Alyahya, Nouf Alajmi. Project- Ensemble PatchCore, Yukino Tusuzuki Category 2 - First Place: Project- Anomaly MoE, Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Jinqiao Wang Category 2 - Second Place: Project- RJVoyagers, Nestor Bao, YJ Chen Category 2 - Honorable mentions: Project-Locore, Xi Jiang, Hanqiu Deng. Project- MVTec LOCO Diffusion-AD, Chu Sam, Jay Liu
|
Workshop Date: 18 June 2024 Our goal for this workshop is to educate researchers about the technological needs of people with vision impairments while empowering researchers to improve algorithms to meet these needs. A key component of this event will be to track progress on six dataset challenges, where the tasks are to answer visual questions, ground answers, recognize visual questions with multiple answer groundings, recognize objects in few-shot learning scenarios, locate objects in few-shot learning scenarios, and classify images in a zero-shot setting. The second key component of this event will be a discussion about current research and application issues, including invited speakers from both academia and industry who will share their experiences in building today’s state-of-the-art assistive technologies as well as designing next-generation tools. Awardees:
|
What is next in multimodal foundation models? Workshop Date: 18 June 2024 Challenge Period: 20 March – 20 May 2024 Multimodal Foundation Models (MMFMs) have shown unprecedented performance in many computer vision tasks. However, on some very specific tasks like document understanding, their performance is still underwhelming. In order to evaluate and improve these strong multi-modal models for the task of document image understanding, we harness a large amount of publicly available and privately gathered data (listed in the image above) and propose a challenge. In the following, we list all the important details related to the challenge. Our challenge is running in two separate phases. Awardees:
|