Computer Vision at Scale: Multi-Camera Tracking, Calibration, and Event Detection for Checkout-Free Retail
Checkout-free retail represents one of the most challenging real-world computer vision deployments, requiring reliable performance across hundreds of sites and millions of interactions. This tutorial bridges academic research and production deployment by framing checkout-free retail as a canonical large-scale multi-camera vision systems problem. The tutorial focuses on three foundational pillars—presented as generalizable computer vision problems rather than application-specific solutions. Each component is discussed in terms of its underlying formulations, scalability constraints, failure modes, and design tradeoffs that transfer directly to autonomous driving, smart spaces, sports analytics, warehouses, and urban sensing. Automatic Multi-Camera CalibrationContinuous online estimation addressing drift and partial failures using deep learning and conventional CV pipelines.Real-Time Multi-Camera TrackingGlobal data association under asynchronous, unreliable observations via integer programming and graph-based formulations.Structured Event DetectionInference from partial visual evidence with edge computing, Kubernetes deployment, and CPU/GPU/TPU optimization. A central theme is how infrastructure constraints—including limited bandwidth, latency requirements, camera reliability, and edge computing budgets—fundamentally shape algorithmic and architectural decisions. Attendees will learn how classical and modern deep learning techniques are adapted for continuous online operation, partial observability, and 99.9%+ system reliability.
Analytic understanding of diffusion models
This tutorial is designed to provide the required mathematical background to develop the intuition behind generalization of deep diffusion models. Starting from the fundamentals of diffusion, we build up to the current understanding of their generalization mechanisms in the most recent papers in the field.
Recent Advances in AI for Medical Imaging: Progress, Challenges, and Future Directions
Recent advances in artificial intelligence (AI) have significantly transformed medical imaging, enabling substantial progress in image acquisition, reconstruction, diagnosis, prognosis, and clinical decision support. Advances in deep learning, foundation models, multimodal integration, and generative modeling have improved accuracy and robustness across modalities such as MRI, CT, X-ray, ultrasound, and digital pathology. Despite these successes, critical challenges remain, including limited generalizability and interpretability, data heterogeneity, privacy concerns, and barriers to real-world clinical deployment. This tutorial provides a concise and up-to-date overview of recent advances in AI for medical imaging, reviewing key paradigms such as physics-informed and interpretable learning, privacy-preserving collaborative learning, and open-source medical imaging foundation models. We further discuss open challenges and future research directions shaping the next generation of medical imaging AI.
Extending Computer Vision to Hidden Objects: A Tutorial on Millimeter-Wave Imaging and Reconstruction of Occluded Scenes
Millimeter-wave (mmWave) signals, such as those used in 5G, 6G, and next-generation WiFi,
are a unique modality that can travel through many everyday occlusions (e.g. cardboard, fabric, fog, etc),
allowing them to sense objects or scenes that are hidden from view.
This unique capability has sparked recent interest in the computer vision community and beyond for using these
signals to enable novel perception tasks with applications spanning autonomous driving, robotics, shipping and
logistics, and more. The goal of this tutorial is to introduce audience members to this modality,
and equip them with the knowledge needed to begin research in this area.
We will cover both fundamental millimeter-wave imaging concepts, as well as recent, state-of-the-art methods.
We will discuss different applications of millimeter-wave sensing, including a deep-dive into two areas:
through-occlusion 3D object reconstruction, and all-weather scene understanding. We will additionally cover
existing datasets, benchmarks, and tools so that audience members new to this area can begin research in this
field. This tutorial is designed to be accessible for an audience with no prior millimeter-wave experience.
Topics to be covered include:How millimeter wave signals differ from visible lightHow millimeter wave signals differ from other through-occlusion modalities (e.g., X-Ray, Ultrasound, etc)Various applications of mmWave sensing, including how they have been used in CV communityClassical methods for using mmWave signals to produce a 2D or 3D imageLimitations of classical mmWave imagingState-of-the-art methods for using mmWave signals to perform surface normal estimation for 3D object reconstructionState-of-the-art methods for using mmWave signals for complete scene reconstruction, segmentation, and object detectionHow researchers can get started in this area, including existing datasets, benchmarks, and tools
All You Need To Know About Self-Driving
The Full Stack of Physical AI: Simulation, Foundation Models, and Edge Deployment for Next-Generation Robotics Applications
The robotics and Physical AI space has been a strong and growing topic at CVPR, especially with computer vision advancements in VLM and VLA models that have become key research areas in recent years. The community has developed several Vision-Language-Action models such asGR00T,π0,OpenVLA,SmolVLA, andACT.However, building a complete robotics pipeline, from data collection to model training to deployment, remains a challenging multi-disciplinary endeavor. Data collection often requires expensive hardware and software solutions, which has prohibited many researchers from pursuing this path. Foundation models require careful architecture design and post-training strategies. And deploying models on edge devices demands hardware-aware optimizations to achieve real-time performance.This tutorial bridges these gaps by providing a hands-on, end-to-end walkthrough of the full Physical AI stack. By the end, attendees will understand the high-level frameworks, tools, and open-source community activities around robotics, embedded devices, and model training, enabling researchers, industry partners, and communities worldwide to improve collaborations in this complex and growing field.
Video Generative Models: Benchmarks and Evaluation
Multimodal Algorithmic Reasoning Workshop
The Seventh Annual Embodied Artificial Intelligence Workshop
Geometry-Free Novel View Synthesis and Controllable Video Models
Multi-Agent Embodied Intelligent Systems Meet Agentic-AI era: Opportunities, Challenges and Futures
9th Multimodal Learning and Applications Workshop
2nd Workshop on Agents in Interaction, from Humans to Robots
6th Omnidirectional Computer Vision Workshop
Personalization in Generative AI Workshop
6th Workshop on CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling
3D Geometry Generation for Scientific Computing (2nd Edition)
The 3rd Workshop on New Trends in AI-Generated Media and Security
Open-World Vision
Open-World Vision (OWV) emphasizes realistic opportunities and challenges in developing and deploying computer vision systems in the dynamic, vast, and unpredictable real open world, which offers abundant data that can benefit training and challenge testing. It contrasts the traditional "closed-world" paradigm of visual learning and inference, which assumes fixed, known data distributions and categorical labels. Models developed under such closed-world assumptions tend to be brittle when encountering ever-changing and novel scenarios in the real open world. Modern visual learning has shifted towards an open-world paradigm, such as pretraining foundation models on massive data sourced from the open world (e.g., web-sourced data). While these models show unprecedented performance and strong adaptability to downstream tasks, they inherit biases from their open-world pretraining data and can still fail in truly novel or underrepresented scenarios during deployment. This workshop aims not only to uncover current limitations, potential risks, emerging opportunities, and unresolved challenges of open-world vision, but also to solicit solutions that advance the field toward more robust, fair, and adaptable visual systems.
Embodied Reasoning in Action: Workshop and Challenge on Embodied Reasoning for Robotic Manipulation
4th Workshop on Maritime Computer Vision
2nd Workshop on Computer Vision for Children
Exploring the Next Generation of Data
The Second Workshop on the Evaluation of the Generative Foundation Models
PhysHuman: Physically Grounded Human Perception and Modeling
VizWiz Grand Challenge: Interpreting Images and Videos Taken by Blind People
Workshop on Visual Concepts
11th Workshop on Computer Vision and Multimodal Microscopy Image Analysis
Trustworthy, Robust, Uncertainty-Aware, and Explainable Visual Intelligence and Beyond
2nd Workshop on Knowledge-Intensive Multimodal Reasoning
6th Workshop on 3D Scene Understanding for Vision, Graphics, and Robotics
How Do Vision Models Work?
Sight and Sound
2nd Workshop on Human-Interactive Generation and Editing
Unified Robotic Vision with Cross-Modal Sensing and Alignment
SPAR-3D: Security, Privacy, and Adversarial Robustness in 3D Generative Vision Models
4th Workshop on Generative Models for Computer Vision
11th New Trends in Image Restoration and Enhancement Workshop and Challenges
Third Workshop for Learning 3D with Multi-View Supervision
Safe Artificial Intelligence for All Domains
EarthVision: Large Scale Computer Vision for Remote Sensing Imagery
From Perception to Persuasion: Challenges and Advances in Misinformation Detection in Society
Humans of Generative AI
9th International Workshop on Visual Odometry and Computer Vision Applications Based on Location Clues
Mobile AI workshop and associated challenges, 6th edition
The 8th UG2+ Workshop and Challenge: Bridging the Gap between Computational Photography and Visual Perception
2nd Workshop on Video Large Language Models
The 5th Workshop on Computer Vision in the Wild: Towards Unified Multimodal Agents For Reasoning in the Wild
The Eighth Workshop on Precognition: Seeing through the Future
The 1st Workshop on Low‑Level Vision Frontiers with Generative AI, Preference Optimization, and Agentic Systems
12th IEEE International Workshop on Computer Vision in Sports
The 6th Workshop of Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents
Workshop on Any-to-any Multimodal Learning
Foundations and Frontiers of Watermarking: Algorithms, Multimodal Extensions, Benchmarks, and Authenticity Frameworks
Invisible watermarking has re-emerged as a critical pillar of trustworthy AI and media authenticity in the era of generative models. This tutorial provides a unified end-to-end treatment of watermarking, spanning classical signal-processing theory, modern deep-learning methods, multimodal extensions (image, video, audio, 3D), robustness benchmarking (WAVES), and real-world deployment within provenance ecosystems such as C2PA and Content Credentials.
From Perception to Action: Building Efficient and Deployable Robot Intelligence Pipelines with Open-Source Edge AI Toolkits
Resource for CVPR 2026 tutorial on "From Perception to Action: Building Efficient and Deployable Robot Intelligence Pipelines with Open-Source Edge AI Toolkits"
The Road to Convergence: Evolution of Unified Multimodal Models
Tracing the evolution of multimodal AI from isolated expertise to Unified Multimodal Models. We introduce the core motivations driving unification — particularly the mutual reinforcement between understanding and generation — and provide a rigorous definition of UMMs.
Third Workshop on Simulation for Autonomous Driving
2nd Workshop on 4D Vision: Modeling the Dynamic World
Bridging AI and Medical Reality: Computer Vision for Real-world Clinical Translation
Appearance Understanding and Generation
Computer Vision × Education: Building a Cross‑Community Agenda for Multimodal Vision in Classrooms
8th International Workshop on Large Scale Holistic Video Understanding
ScaleBot: The First Workshop on Scalable Robot Learning Systems
The 2nd Workshop on Multi-Modal Reasoning for AI Agents
Artificial Intelligence for Space
4D World Models: Bridging Generation and Reconstruction
4D Digital Twins: Real-to-Sim-to-Real for Physical AI
2nd Workshop on GenAI for Storytelling
See the World in a Different Light: Physical Appearance Modeling and Relighting in the Age of Generative AI
Medical Reasoning with Vision Language Foundation Models
Domain Generalization: Evolution, Breakthroughs, and Future Horizons (2nd Edition)
The 2nd CVPR Workshop Proposal on Foundation Models Meet Embodied Agents
CVPR 2026 Biometrics Workshop
Big Model Adaptation In Computer Vision
Eighth Workshop on Image Matching: Local Features and Beyond
The 7th International Workshop on Eye and Gaze in Computer Vision
Pixel-level Video Understanding in the Wild Challenge
Second Workshop on Skilled Activity Understanding, Assessment & Feedback Generation
Imagine a world where computer vision-based systems can analyze a video of an athlete, a surgeon, a patient, or a factory worker and instantly provide expert-level actionable feedback---correcting techniques, identifying inefficiencies, and helping people refine their skills in real time. Thanks to rapid progress in video understanding, this vision is becoming reality. AI-powered systems can now analyze complex human activities, assess performance, and generate intelligent feedback, unlocking new possibilities in sports, healthcare, manufacturing, education, rehabilitation, and beyond. Through Expert Keynotes and Invited Contributions, this CVPR 2026 workshop will explore the cutting edge of skilled activity understanding, assessment, and feedback generation, bridging research and real-world applications.
As AI systems become more capable of understanding human expertise, the implications are profound---empowering individuals with personalized coaching, democratized skill development, and scalable training solutions. We invite researchers, industry leaders, and practitioners to join us in shaping the future of AI-powered skill understanding. Whether working on foundational research, applied solutions, or real-world deployment, this workshop is an opportunity and forum to learn about and push the boundaries of how AI perceives, evaluates, and enhances human ability.