Registration Desk: Registration / Badge Pickup Mon 17 Jun 07:00 a.m.
9th New Trends in Image Restoration and Enhancement Workshop and Challenges Mon 17 Jun 08:00 a.m.
HYBRID: Room Arch 204, Seattle Convention Center
SCHEDULE: https://www.cvlai.net/ntire/2024/#schedule
Mon Jun 08:00 - 18:00 PDT
Lunch Break: 12:00-13:00
Poster session: 16:00-18:00
Workshop: Efficient Large Vision Models Mon 17 Jun 08:00 a.m.
Workshop: Domain adaptation, Explainability and Fairness in AI for Medical Image Analysis (DEF-AI-MIA) Mon 17 Jun 08:00 a.m.
Workshop: Computer Vision for Mixed Reality Mon 17 Jun 08:00 a.m.
Workshop: 8th AI City Challenge Mon 17 Jun 08:00 a.m.
Multimodal Algorithmic Reasoning Workshop Mon 17 Jun 08:25 a.m.
Workshop: SyntaGen: Harnessing Generative Models for Synthetic Visual Datasets Mon 17 Jun 08:25 a.m.
The 7th Workshop and Challenge Bridging the Gap between Computational Photography and Visual Recognition (UG2+) Mon 17 Jun 08:30 a.m.
5th International Workshop on Large Scale Holistic Video Understanding Mon 17 Jun 08:30 a.m.
2nd Workshop on Foundation Models Mon 17 Jun 08:30 a.m.
Workshop: AIS: Vision, Graphics and AI for Streaming Mon 17 Jun 08:30 a.m.
1st Workshop on Dataset Distillation for Computer Vision Mon 17 Jun 08:30 a.m.
In the past decade, deep learning has been mainly advanced by training increasingly large models on increasingly large datasets which comes with the price of massive computation and expensive devices for their training.
As a result, research on designing state-of-the-art models gradually gets monopolized by large companies, while research groups with limited resources such as universities and small companies are unable to compete.
Reducing the training dataset size while preserving model training effects is significant for reducing the training cost, enabling green AI, and encouraging the university research groups to engage in the latest research.
This workshop focuses on the emerging research field of dataset distillation which aims to compress a large training dataset into a tiny informative one (e.g. 1\% of the size of the original data) while maintaining the performance of models trained on this dataset. Besides general-purpose efficient model training, dataset distillation can also greatly facilitate downstream tasks such as neural architecture/hyperparameter search by speeding up model evaluation, continual learning by producing compact memory, federated learning by reducing data transmission, and privacy-preserving learning by removing data privacy. Dataset distillation is also closely related to research topics including core-set selection, prototype generation, active learning, few-shot learning, generative models, and a broad area of learning from synthetic data.
Although DD has become an important paradigm in various machine-learning tasks, the potential of DD in computer vision (CV) applications, such as face recognition, person re-identification, and action recognition is far from being fully exploited.
Moreover, DD has rarely been demonstrated effectively in advanced computer vision tasks such as object detection, image segmentation, and video understanding.
Further, numerous unexplored challenges and unresolved issues exist in the realm of DD.
One such challenge pertains to finding efficient methods to modify existing DD workflows or create entirely new ones to address a wide range of computer vision tasks, extending beyond mere image classification.
An additional challenge lies in improving the scalability of dataset distillation (DD) methods to compress real-world datasets beyond the scale of ImageNet.
The purpose of this workshop is to unite researchers and professionals who share an interest in Dataset Distillation for computer vision for developing the next generation of dataset distillation methods for computer vision applications.
4th Workshop on Physics Based Vision meets Deep Learning (PBDL2024) Mon 17 Jun 08:30 a.m.
First Workshop on Efficient and On-Device Generation (EDGE) Mon 17 Jun 08:30 a.m.
4th International Workshop on Long-form Video Understanding: Towards Multimodal AI Assistant and Copilot Mon 17 Jun 08:30 a.m.
Workshop: Foundation Models for Medical Vision Mon 17 Jun 08:30 a.m.
The 4th Workshop of Adversarial Machine Learning on Computer Vision: Robustness of Foundation Models Mon 17 Jun 08:30 a.m.
The 3rd International Workshop on Federated Learning for Computer Vision (FedVision-2024) Mon 17 Jun 08:30 a.m.
4th Mobile AI Workshop and Challenges Mon 17 Jun 08:30 a.m.
Workshop: Computer Vision in the Wild Mon 17 Jun 08:30 a.m.
MetaFood Workshop (MTF) Mon 17 Jun 08:30 a.m.
4th Workshop on CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling Mon 17 Jun 08:30 a.m.
1st Workshop on Urban Scene Modeling: Where Vision Meets Photogrammetry and Graphics Mon 17 Jun 08:30 a.m.
The Fifth Workshop on Fair, Data-efficient, and Trusted Computer Vision Mon 17 Jun 08:30 a.m.
2nd Workshop on Multimodal Content Moderation Mon 17 Jun 08:30 a.m.
The 5th Face Anti-Spoofing Workshop Mon 17 Jun 08:30 a.m.
Workshop on Computer Vision for Fashion, Art, and Design Mon 17 Jun 08:30 a.m.
Workshop: AI for 3D Generation Mon 17 Jun 08:30 a.m.
Workshop: AI for Content Creation (AI4CC) Mon 17 Jun 08:30 a.m.
Workshop: ViLMa – Visual Localization and Mapping Mon 17 Jun 08:30 a.m.
Workshop: New Challenges in 3D Human Understanding Mon 17 Jun 08:30 a.m.
First Joint Egocentric Vision (EgoVis) Workshop Mon 17 Jun 08:30 a.m.
Workshop: VAND 2.0: Visual Anomaly and Novelty Detection Mon 17 Jun 08:30 a.m.
2nd Workshop on Scene Graphs and Graph Representation Learning Mon 17 Jun 08:30 a.m.
Workshop: CV4Science 2025: Using Computer Vision for the Sciences Mon 17 Jun 08:30 a.m.
Tool-Augmented VIsion Workshop Mon 17 Jun 08:45 a.m.
Second Workshop for Learning 3D with Multi-View Supervision Mon 17 Jun 08:45 a.m.
Tutorial: Sijia Liu · Yang Liu · Nathalie Baracaldo · Eleni Triantafillou
Machine Unlearning in Computer Vision: Foundations and Applications
This tutorial aims to offer a comprehensive understanding of emerging machine unlearning (MU) techniques. These techniques are designed to accurately assess the impact of specific data points, classes, or concepts (e.g., related to copyrighted information, biases and stereotypes, and personally identifying data) on model performance and efficiently eliminate their potentially harmful influence within a pre-trained model. With the recent shift to foundation models, MU has become indispensable, as re-training from scratch is prohibitively costly in terms of time, computational resources, and finances. Despite increasing research interest, MU for vision tasks remains significantly underexplored compared to its prominence in the security and privacy (SP) field. Within this tutorial, we will delve into the algorithmic foundations of MU methods, including techniques such as localization-informed unlearning, unlearning-focused finetuning, and vision model-specific optimizers. We will provide a comprehensive and clear overview of the diverse range of applications for MU in CV. Furthermore, we will emphasize the importance of unlearning from an industry perspective, where modifying the model during its life-cycle is preferable to re-training it entirely, and where metrics to verify the unlearning process become paramount. Our tutorial will furnish the general audience with sufficient background information to grasp the motivation, research progress, opportunities, and ongoing challenges in MU.
Bio s:Tutorial: Zhengyuan Yang · Linjie Li · Zhe Gan · Chunyuan Li · Jianwei Yang
Recent Advances in Vision Foundation Models
This tutorial covers the advanced topics in designing and training vision foundation models, including the state-of-the-art approaches and principles in (i) learning vision foundation models for multimodal understanding and generation, (ii) benchmarking and evaluating vision foundation models, and (iii) agents and other advanced systems based on vision foundation models.
Bio s:Tutorial: Edward Kim · Sanjit Seshia · Daniel Fremont · Jinkyu Kim · Kimin Lee · Hazem Torfah · Necmiye Ozay · Parasara Sridhar Duggirala · Marcell Vazquez-Chanlatte
Autonomous systems, such as self-driving cars or intelligent robots, are increasingly operating in complex, stochastic environments where they dynamically interact with multiple entities (human and robot). There is a need to formally model and generate such environments in simulation, for use cases that span synthetic training data generation and rigorous evaluation of safety. In this tutorial, we provide an in-depth tutorial on Scenic, a simulator-agnostic probabilistic programming language to model complex multi-agent, physical environments with stochasticity and spatio-temporal constraints. Scenic has been used in a variety of domains such as self-driving, aviation, indoor robotics, multi-agent systems, and augmented/virtual reality. Using Scenic and associated open source tools, one can (1) model and sample from distributions with spatial and temporal constraints, (2) generate synthetic data in a controlled, programmatic fashion to train and test machine learning components, (3) reason about the safety of AI-enabled autonomous systems, (4) automatically find edge cases, (5) debug and root-cause failures of AI components including for perception, and (6) bridge the sim-to-real gap in autonomous system design. We will provide a hands-on tutorial on the basics of Scenic and its applications, how to create Scenic programs and your own new applications on top of Scenic, and to interface the language to your simulator/renderer of choice. For more information on Scenic, please visit the website: https://scenic-lang.org
Bio s:
Workshop: AI4Space 2024 Mon 17 Jun 09:00 a.m.
7th Workshop on Autonomous Driving (WAD) Mon 17 Jun 09:00 a.m.
The CVPR 2024 Workshop on Autonomous Driving (WAD) brings together leading researchers and engineers from academia and industry to discuss the latest advances in autonomous driving. Now in its 7th year, the workshop has been continuously evolving with this rapidly changing field and now covers all areas of autonomy, including perception, behavior prediction and motion planning. In this full-day workshop, our keynote speakers will provide insights into the ongoing commercialization of autonomous vehicles, as well as progress in related fundamental research areas. Furthermore, we will host a series of technical benchmark challenges to help quantify recent advances in the field, and invite authors of accepted workshop papers to present their work.
Workshop: Sight and Sound Mon 17 Jun 09:00 a.m.
Tutorial: Matteo Poggi
Deep Stereo Matching in the Twenties
For decades, stereo matching has been approached by developing hand-crafted algorithms, focused on measuring the visual appearance between local patterns in the two images and propagating this information globally. Since 2015, deep learning led to a paradigm shift in this field, driving the community to the design of end-to-end deep networks capable of matching pixels. The results of this revolution brought stereo matching to a whole new level of accuracy, yet not without any drawbacks. Indeed, some hard challenges remained unsolved by the first generation of deep stereo models, as they were often not capable of properly generalizing across different domains -- e.g., from synthetic to real, from indoor to outdoor -- or dealing with high-resolution images.
This was, however, three years ago. These and other challenges have been faced by the research community in the Twenties, making deep stereo matching even more mature and suitable to be a practical solution for everyday applications. For instance, now we have networks capable of generalizing much better from synthetic to real images, as well as handling high-resolution images or even estimating disparity correctly in the presence of non-Lambertian surfaces -- known to be among the ill-posed challenges for stereo. Accordingly, in this tutorial, we aim at giving a comprehensive overview of the state-of-the-art of deep stereo matching, which architectural designs have been crucial to reach this level of maturity and how to select the best solution for estimating depth from stereo in real applications.
Bio :
Workshop: Causal and Object-Centric Representations for Robotics Mon 17 Jun 09:00 a.m.
CVPR 2024 Biometrics Workshop Mon 17 Jun 09:00 a.m.
Workshop: EarthVision: Large Scale Computer Vision for Remote Sensing Imagery Mon 17 Jun 09:00 a.m.
Workshop: Foundation Models for Autonomous Systems Mon 17 Jun 09:00 a.m.
Workshop: Prompting in Vision Mon 17 Jun 09:00 a.m.
Tutorial: Xin Jin · Wenjun Zeng · Tao Yang · Yue Song · Nicu Sebe · Xingyi Yang · Xinchao Wang · Shuicheng Yan
Disentanglement and Compositionality in Computer Vision
This tutorial aims to explore the concepts of disentanglement and compositionality in the field of computer vision. These concepts play a crucial role in enabling machines to understand and interpret visual information with more sophistication and human-like reasoning. Participants will learn about advanced techniques and models that allow for the disentanglement of visual factors in images and the compositionality of these factors to produce more meaningful representations. All in all, Disentanglement and Composition are believed to be one of the possible ways for AI to fundamentally understand the world, and eventually achieve Artificial General Intelligence (AGI).
Bio s:Workshop: CV 20/20: A Retrospective Vision Mon 17 Jun 12:45 p.m.
GenAI Media Generation Challenge for Computer Vision Workshop Mon 17 Jun 01:00 p.m.
2nd Workshop on Embodied "Humans": Symbiotic Intelligence between Virtual Humans and Humanoid Robots Mon 17 Jun 01:00 p.m.
Workshop: Data Curation and Augmentation in Enhancing Medical Imaging Applications Mon 17 Jun 01:00 p.m.
Workshop: Image Matching: Local Features and Beyond Mon 17 Jun 01:00 p.m.
Workshop on TDLCV: Topological Deep Learning for Computer Vision Mon 17 Jun 01:00 p.m.
Workshop: Rhobin 2024: The second Rhobin challenge on Reconstruction of Human-Object Interaction Mon 17 Jun 01:20 p.m.
3rd Workshop on Vision Datasets Understanding and DataCV Challenge Mon 17 Jun 01:30 p.m.
Tutorial: Benjamin Kimia · Timothy Duff · Ricardo Fabbri · Hongyi Fan
Efficient Homotopy Continuation for Solving Polynomial Systems in Computer Vision Applications
Minimal problems and their solvers play an important role in RANSAC-based approaches to several estimation problems in vision. Minimal solvers solve systems of equations, depending on data, which obey a “conservation of number principle”: for sufficiently generic data, the number of solutions over the complex numbers is constant. Homotopy continuation (HC) methods exploit not just this conservation principle, but also the smooth dependence of solutions on problem data. The classical solution of polynomial systems using Grobner basis, resultants, elimination templates, etc. has been largely successful in smaller problems, but these methods are not able to tackle larger polynomials systems with a larger number of solutions. While HC methods can solve these problems, they have been notoriously slow. Recent research by the presenters and other researchers has enabled efficient HC solvers with the ability for real-time solutions.
The main objective of this tutorial is to make this technology more accessible to the computer vision community. Specifically, after an overview of how such methods can be useful for solving problems in vision (e.g., absolute/relative pose, triangulation), we will describe some of the basic theoretical apparatus underlying HC solvers, including both local and global “probability-1” aspects. On the practical side, we will describe recent advances enabled by GPUs, learning-based approaches, and how to build your own HC-based minimal solvers.
Bio s:Workshop: Ethical Considerations in Creative Applications of Computer Vision Mon 17 Jun 01:30 p.m.
Workshop: Multimodalities for 3D Scenes Mon 17 Jun 01:30 p.m.
The Seventh International Workshop on Computer Vision for Physiological Measurement (CVPM) Mon 17 Jun 01:30 p.m.
Workshop on Virtual Try-On Mon 17 Jun 01:30 p.m.
Workshop: Pixel-level Video Understanding in the Wild Challenge Mon 17 Jun 01:30 p.m.
Workshop: Neural Rendering Intelligence Mon 17 Jun 01:30 p.m.
Tutorial: Mohit Prabhushankar · Ghassan AlRegib
Robustness at Inference: Towards Explainability, Uncertainty, and Intervenability
Neural networks provide generalizable and task independent representation spaces that have garnered widespread applicability in image understanding applications. The complicated semantics of feature interactions within image data has been broken down into a set of non-linear functions, convolution parameters, attention, as well as multi-modal inputs among others. The complexity of these operations has introduced multiple vulnerabilities within neural network architectures. These vulnerabilities include adversarial and out-of-distribution samples, confidence calibration issues, and catastrophic forgetting among others. Given that AI promises to herald the fourth industrial revolution, it is critical to understand and overcome these vulnerabilities. Doing so requires creating robust neural networks that drive the AI systems. Defining robustness, however, is not trivial. Simple measurements of invariance to noise and perturbations are not applicable in real life settings. In this tutorial, we provide a human-centric approach to understanding robustness in neural networks that allow AI systems to function in society. Doing so allows us to state the following: 1) All neural networks must provide contextual and relevant explanations to humans, 2) Neural networks must know when and what they don’t know, 3) Neural Networks must be amenable to being intervened upon by humans at decision-making stage. These three statements call for robust neural networks to be explainable, equipped with uncertainty quantification, and be intervenable.
Bio s:Workshop on Graphic Design Understanding and Generation (GDUG) Mon 17 Jun 01:30 p.m.
2nd Workshop and Challenge on DeepFake Analysis and Detection Mon 17 Jun 01:30 p.m.
Tutorial: Yanwei Fu · Francesco Locatello · Tianjun Xiao · Tong He · Ke Fan
Object-centric Representations in Computer Vision
This tutorial discusses the evolution of object-centric representation in computer vision and deep learning. Initially inspired by decomposing visual scenes into surfaces and objects, recent developments focus on learning causal variables from high-dimensional observations like images or videos. The tutorial covers the objectives of OCL, its development, and connections with machine learning fields, emphasizing object-centric approaches, especially in unsupervised segmentation. Advances in encoder, decoder, and self-supervised learning objectives are explored, with a focus on real-world applications and challenges. The tutorial also introduces open-source tools and showcases breakthroughs in video-based object-centric learning. This tutorial will have four talks covering the basic ideas, learning good features for object-centric learning, video based object-centric representation, and more diverse real-world applications.
Bio s:Tutorial: Orhun Aydin · Philipe Ambrozio Dias · Dalton Lunga
Geospatial Computer Vision and Machine Learning for Large-Scale Earth Observation Data
The 5Vs of big data, volume, value, variety, velocity, and veracity pose immense opportunity and challenges on implementing local and planet-wide solution from Earth observation (EO) data. EO data, residing at the center of various multidisciplinary problems, primarily obtained through satellite imagery, aerial photography, and UAV-based platforms. Understanding Earth Observation data unlocks this immense data source to address planet-scale problems with computer vision and machine learning techniques for geospatial analysis. This workshop introduces current EO data sources, problems, and image-based analysis techniques. The most recent advances in data, models, and open-source analysis ecosystem related to computer vision and deep learning for EO data will be introduced.
Bio s:Fifth Workshop on Neural Architecture Search Mon 17 Jun 01:30 p.m.
Tutorial: Fabricio Narcizo · Elizabete Munzlinger · Anuj Dutt · Shan Shaffi · Sai Narsi Reddy Donthi Reddy
Edge AI in Action: Practical Approaches to Developing and Deploying Optimized Models
Edge AI refers to artificial intelligence applied to edge devices like smartphones, tablets, laptops, cameras, sensors, and drones. It enables these devices to handle AI tasks autonomously, without cloud or central server connections, offering higher speed, lower latency, greater privacy, and reduced power consumption. Edge AI presents challenges and opportunities in model development and deployment, including size reduction, compression, quantization, and distillation, and involves integrating and communicating between edge devices and the cloud or other devices in a hybrid and distributed architecture. This tutorial provides practical guidance on developing and deploying optimized models for edge AI, covering theoretical and technical aspects, best practices, and real-world case studies focused on computer vision and deep learning models. We demonstrate tools and frameworks like TensorFlow, PyTorch, ONNX, OpenVINO, Google Mediapipe, and Qualcomm SNPE. We will also discuss multi-modal AI applications such as head pose estimation, person segmentation, hand gesture recognition, sound localization, and more. These applications use images, videos, and sounds to create interactive edge AI experiences. The presentation will include developing and deploying these models on Jabra collaborative business cameras and exploring integration with devices like Luxonis OAK-1 MAX, Neural Compute Engine Myriad X, and NVIDIA Jetson Nano Developer Kit.
Bio s: