Skip to yearly menu bar Skip to main content


Timezone: America/Los_Angeles
Filter Events
Registration Desk
7:00 AM - 5:00 PM
Workshop

HYBRID: Room Arch 204, Seattle Convention Center
SCHEDULE: https://www.cvlai.net/ntire/2024/#schedule
Mon Jun 08:00 - 18:00 PDT

Lunch Break: 12:00-13:00
Poster session: 16:00-18:00

... more
Workshop

Efficient Large Vision Models

Auke Wiggers · Amirhossein Habibian
8:00 AM - 12:35 PM
Workshop
8:00 AM - 12:45 PM
Workshop
8:00 AM - 5:30 PM
Workshop
8:25 AM - 12:15 PM
Workshop

2nd Workshop on Foundation Models

Hisham Cholakkal · Teng Xi
8:30 AM - 5:30 PM
Workshop
8:30 AM - 5:30 PM
Workshop

In the past decade, deep learning has been mainly advanced by training increasingly large models on increasingly large datasets which comes with the price of massive computation and expensive devices for their training.
As a result, research on designing state-of-the-art models gradually gets monopolized by large companies, while research groups with limited resources such as universities and small companies are unable to compete.
Reducing the training dataset size while preserving model training effects is significant for reducing the training cost, enabling green AI, and encouraging the university research groups to engage in the latest research.
This workshop focuses on the emerging research field of dataset distillation which aims to compress a large training dataset into a tiny informative one (e.g. 1\% of the size of the original data) while maintaining the performance of models trained on this dataset. Besides general-purpose efficient model training, dataset distillation can also greatly facilitate downstream tasks such as neural architecture/hyperparameter search by speeding up model evaluation, continual learning by producing compact memory, federated learning by reducing data transmission, and privacy-preserving learning by removing data privacy. Dataset distillation is also closely related to research topics including core-set selection, prototype generation, active learning, few-shot learning, generative models, and a broad area of learning from synthetic data.




Although DD has become an important paradigm in various machine-learning tasks, the potential of DD in computer vision (CV) applications, such as face recognition, person re-identification, and action recognition is far from being fully exploited.
Moreover, DD has rarely been demonstrated effectively in advanced computer vision tasks such as object detection, image segmentation, and video understanding.
Further, numerous unexplored challenges and unresolved issues exist in the realm of DD.
One such challenge pertains to finding efficient methods to modify existing DD workflows or create entirely new ones to address a wide range of computer vision tasks, extending beyond mere image classification.
An additional challenge lies in improving the scalability of dataset distillation (DD) methods to compress real-world datasets beyond the scale of ImageNet.

The purpose of this workshop is to unite researchers and professionals who share an interest in Dataset Distillation for computer vision for developing the next generation of dataset distillation methods for computer vision applications.

... more
Workshop
8:30 AM - 5:30 PM
Workshop
8:30 AM - 6:00 PM
Workshop
8:30 AM - 5:30 PM
Workshop
8:30 AM - 5:30 PM
Workshop
8:30 AM - 12:30 PM
Workshop
Workshop
Workshop
Workshop
8:30 AM - 12:00 PM
Workshop
Workshop

AI for 3D Generation

Despoina Paschalidou
8:30 AM - 5:30 PM
Workshop

AI for Content Creation (AI4CC)

James Tompkin · Deqing Sun
8:30 AM - 5:30 PM
Workshop
8:30 AM - 5:30 PM
Workshop
8:30 AM - 1:00 PM
Workshop
8:30 AM - 6:15 PM
Workshop
8:30 AM - 5:30 PM
Workshop
Workshop
Workshop
8:45 AM - 12:45 PM
Workshop
Tutorial

Machine Unlearning in Computer Vision: Foundations and Applications

Sijia Liu · Yang Liu · Nathalie Baracaldo · Eleni Triantafillou
9:00 AM - 12:00 PM

This tutorial aims to offer a comprehensive understanding of emerging machine unlearning (MU) techniques. These techniques are designed to accurately assess the impact of specific data points, classes, or concepts (e.g., related to copyrighted information, biases and stereotypes, and personally identifying data) on model performance and efficiently eliminate their potentially harmful influence within a pre-trained model. With the recent shift to foundation models, MU has become indispensable, as re-training from scratch is prohibitively costly in terms of time, computational resources, and finances. Despite increasing research interest, MU for vision tasks remains significantly underexplored compared to its prominence in the security and privacy (SP) field. Within this tutorial, we will delve into the algorithmic foundations of MU methods, including techniques such as localization-informed unlearning, unlearning-focused finetuning, and vision model-specific optimizers. We will provide a comprehensive and clear overview of the diverse range of applications for MU in CV. Furthermore, we will emphasize the importance of unlearning from an industry perspective, where modifying the model during its life-cycle is preferable to re-training it entirely, and where metrics to verify the unlearning process become paramount. Our tutorial will furnish the general audience with sufficient background information to grasp the motivation, research progress, opportunities, and ongoing challenges in MU.

... more
Tutorial

Recent Advances in Vision Foundation Models

Zhengyuan Yang · Linjie Li · Zhe Gan · Chunyuan Li · Jianwei Yang
9:00 AM - 5:00 PM

This tutorial covers the advanced topics in designing and training vision foundation models, including the state-of-the-art approaches and principles in (i) learning vision foundation models for multimodal understanding and generation, (ii) benchmarking and evaluating vision foundation models, and (iii) agents and other advanced systems based on vision foundation models.

... more
Tutorial

SCENIC: An Open-Source Probabilistic Programming System for Data Generation and Safety in AI-Based Autonomy

Edward Kim · Sanjit Seshia · Daniel Fremont · Jinkyu Kim · Kimin Lee · Hazem Torfah · Necmiye Ozay · Parasara Sridhar Duggirala · Marcell Vazquez-Chanlatte
9:00 AM - 12:00 PM

Autonomous systems, such as self-driving cars or intelligent robots, are increasingly operating in complex, stochastic environments where they dynamically interact with multiple entities (human and robot). There is a need to formally model and generate such environments in simulation, for use cases that span synthetic training data generation and rigorous evaluation of safety. In this tutorial, we provide an in-depth tutorial on Scenic, a simulator-agnostic probabilistic programming language to model complex multi-agent, physical environments with stochasticity and spatio-temporal constraints. Scenic has been used in a variety of domains such as self-driving, aviation, indoor robotics, multi-agent systems, and augmented/virtual reality. Using Scenic and associated open source tools, one can (1) model and sample from distributions with spatial and temporal constraints, (2) generate synthetic data in a controlled, programmatic fashion to train and test machine learning components, (3) reason about the safety of AI-enabled autonomous systems, (4) automatically find edge cases, (5) debug and root-cause failures of AI components including for perception, and (6) bridge the sim-to-real gap in autonomous system design. We will provide a hands-on tutorial on the basics of Scenic and its applications, how to create Scenic programs and your own new applications on top of Scenic, and to interface the language to your simulator/renderer of choice. For more information on Scenic, please visit the website: https://scenic-lang.org

... more
Workshop

AI4Space 2024

Tat-Jun Chin
9:00 AM - 12:10 PM
Workshop
9:00 AM - 6:00 PM

The CVPR 2024 Workshop on Autonomous Driving (WAD) brings together leading researchers and engineers from academia and industry to discuss the latest advances in autonomous driving. Now in its 7th year, the workshop has been continuously evolving with this rapidly changing field and now covers all areas of autonomy, including perception, behavior prediction and motion planning. In this full-day workshop, our keynote speakers will provide insights into the ongoing commercialization of autonomous vehicles, as well as progress in related fundamental research areas. Furthermore, we will host a series of technical benchmark challenges to help quantify recent advances in the field, and invite authors of accepted workshop papers to present their work.

... more
Workshop

Sight and Sound

Andrew Owens
9:00 AM - 6:00 PM
Tutorial
9:00 AM - 12:00 PM

For decades, stereo matching has been approached by developing hand-crafted algorithms, focused on measuring the visual appearance between local patterns in the two images and propagating this information globally. Since 2015, deep learning led to a paradigm shift in this field, driving the community to the design of end-to-end deep networks capable of matching pixels. The results of this revolution brought stereo matching to a whole new level of accuracy, yet not without any drawbacks. Indeed, some hard challenges remained unsolved by the first generation of deep stereo models, as they were often not capable of properly generalizing across different domains -- e.g., from synthetic to real, from indoor to outdoor -- or dealing with high-resolution images.

This was, however, three years ago. These and other challenges have been faced by the research community in the Twenties, making deep stereo matching even more mature and suitable to be a practical solution for everyday applications. For instance, now we have networks capable of generalizing much better from synthetic to real images, as well as handling high-resolution images or even estimating disparity correctly in the presence of non-Lambertian surfaces -- known to be among the ill-posed challenges for stereo. Accordingly, in this tutorial, we aim at giving a comprehensive overview of the state-of-the-art of deep stereo matching, which architectural designs have been crucial to reach this level of maturity and how to select the best solution for estimating depth from stereo in real applications.

... more
Workshop
9:00 AM - 5:00 PM
Workshop
9:00 AM - 5:30 PM
Workshop
9:00 AM - 5:00 PM
Workshop

Prompting in Vision

Amir Bar · Kaiyang Zhou
9:00 AM - 5:30 PM
Tutorial

Disentanglement and Compositionality in Computer Vision

Xin Jin · Wenjun Zeng · Tao Yang · Yue Song · Nicu Sebe · Xingyi Yang · Xinchao Wang · Shuicheng Yan
9:00 AM - 12:00 PM

This tutorial aims to explore the concepts of disentanglement and compositionality in the field of computer vision. These concepts play a crucial role in enabling machines to understand and interpret visual information with more sophistication and human-like reasoning. Participants will learn about advanced techniques and models that allow for the disentanglement of visual factors in images and the compositionality of these factors to produce more meaningful representations. All in all, Disentanglement and Composition are believed to be one of the possible ways for AI to fundamentally understand the world, and eventually achieve Artificial General Intelligence (AGI).

... more
Workshop
12:45 PM - 6:05 PM
Workshop
1:00 PM - 5:45 PM
Tutorial

Efficient Homotopy Continuation for Solving Polynomial Systems in Computer Vision Applications

Benjamin Kimia · Timothy Duff · Ricardo Fabbri · Hongyi Fan
1:30 PM - 6:00 PM

Minimal problems and their solvers play an important role in RANSAC-based approaches to several estimation problems in vision. Minimal solvers solve systems of equations, depending on data, which obey a “conservation of number principle”: for sufficiently generic data, the number of solutions over the complex numbers is constant. Homotopy continuation (HC) methods exploit not just this conservation principle, but also the smooth dependence of solutions on problem data. The classical solution of polynomial systems using Grobner basis, resultants, elimination templates, etc. has been largely successful in smaller problems, but these methods are not able to tackle larger polynomials systems with a larger number of solutions. While HC methods can solve these problems, they have been notoriously slow. Recent research by the presenters and other researchers has enabled efficient HC solvers with the ability for real-time solutions.

The main objective of this tutorial is to make this technology more accessible to the computer vision community. Specifically, after an overview of how such methods can be useful for solving problems in vision (e.g., absolute/relative pose, triangulation), we will describe some of the basic theoretical apparatus underlying HC solvers, including both local and global “probability-1” aspects. On the practical side, we will describe recent advances enabled by GPUs, learning-based approaches, and how to build your own HC-based minimal solvers.

... more
Workshop
Workshop
1:30 PM - 5:30 PM
Workshop

Workshop on Virtual Try-On

Vidya Narayanan
1:30 PM - 6:00 PM
Workshop
Workshop
1:30 PM - 5:30 PM
Tutorial
1:30 PM - 5:30 PM

Neural networks provide generalizable and task independent representation spaces that have garnered widespread applicability in image understanding applications. The complicated semantics of feature interactions within image data has been broken down into a set of non-linear functions, convolution parameters, attention, as well as multi-modal inputs among others. The complexity of these operations has introduced multiple vulnerabilities within neural network architectures. These vulnerabilities include adversarial and out-of-distribution samples, confidence calibration issues, and catastrophic forgetting among others. Given that AI promises to herald the fourth industrial revolution, it is critical to understand and overcome these vulnerabilities. Doing so requires creating robust neural networks that drive the AI systems. Defining robustness, however, is not trivial. Simple measurements of invariance to noise and perturbations are not applicable in real life settings. In this tutorial, we provide a human-centric approach to understanding robustness in neural networks that allow AI systems to function in society. Doing so allows us to state the following: 1) All neural networks must provide contextual and relevant explanations to humans, 2) Neural networks must know when and what they don’t know, 3) Neural Networks must be amenable to being intervened upon by humans at decision-making stage. These three statements call for robust neural networks to be explainable, equipped with uncertainty quantification, and be intervenable.

... more
Workshop
Workshop
Tutorial

Object-centric Representations in Computer Vision

Yanwei Fu · Francesco Locatello · Tianjun Xiao · Tong He · Ke Fan
1:30 PM - 6:00 PM

This tutorial discusses the evolution of object-centric representation in computer vision and deep learning. Initially inspired by decomposing visual scenes into surfaces and objects, recent developments focus on learning causal variables from high-dimensional observations like images or videos. The tutorial covers the objectives of OCL, its development, and connections with machine learning fields, emphasizing object-centric approaches, especially in unsupervised segmentation. Advances in encoder, decoder, and self-supervised learning objectives are explored, with a focus on real-world applications and challenges. The tutorial also introduces open-source tools and showcases breakthroughs in video-based object-centric learning. This tutorial will have four talks covering the basic ideas, learning good features for object-centric learning, video based object-centric representation, and more diverse real-world applications.

... more
Tutorial
1:30 PM - 5:00 PM

The 5Vs of big data, volume, value, variety, velocity, and veracity pose immense opportunity and challenges on implementing local and planet-wide solution from Earth observation (EO) data. EO data, residing at the center of various multidisciplinary problems, primarily obtained through satellite imagery, aerial photography, and UAV-based platforms. Understanding Earth Observation data unlocks this immense data source to address planet-scale problems with computer vision and machine learning techniques for geospatial analysis. This workshop introduces current EO data sources, problems, and image-based analysis techniques. The most recent advances in data, models, and open-source analysis ecosystem related to computer vision and deep learning for EO data will be introduced.

... more
Workshop
1:30 PM - 5:30 PM
Tutorial

Edge AI in Action: Practical Approaches to Developing and Deploying Optimized Models

Fabricio Narcizo · Elizabete Munzlinger · Anuj Dutt · Shan Shaffi · Sai Narsi Reddy Donthi Reddy
2:00 PM - 5:30 PM

Edge AI refers to artificial intelligence applied to edge devices like smartphones, tablets, laptops, cameras, sensors, and drones. It enables these devices to handle AI tasks autonomously, without cloud or central server connections, offering higher speed, lower latency, greater privacy, and reduced power consumption. Edge AI presents challenges and opportunities in model development and deployment, including size reduction, compression, quantization, and distillation, and involves integrating and communicating between edge devices and the cloud or other devices in a hybrid and distributed architecture. This tutorial provides practical guidance on developing and deploying optimized models for edge AI, covering theoretical and technical aspects, best practices, and real-world case studies focused on computer vision and deep learning models. We demonstrate tools and frameworks like TensorFlow, PyTorch, ONNX, OpenVINO, Google Mediapipe, and Qualcomm SNPE. We will also discuss multi-modal AI applications such as head pose estimation, person segmentation, hand gesture recognition, sound localization, and more. These applications use images, videos, and sounds to create interactive edge AI experiences. The presentation will include developing and deploying these models on Jabra collaborative business cameras and exploring integration with devices like Luxonis OAK-1 MAX, Neural Compute Engine Myriad X, and NVIDIA Jetson Nano Developer Kit.

... more