The Full Stack of Physical AI: Simulation, Foundation Models, and Edge Deployment for Next-Generation Robotics Applications
Abstract
The robotics and Physical AI space has been a strong and growing topic at CVPR, especially with computer vision advancements in VLM and VLA models that have become key research areas in recent years. The community has developed several Vision-Language-Action models such asGR00T,π0,OpenVLA,SmolVLA, andACT.However, building a complete robotics pipeline, from data collection to model training to deployment, remains a challenging multi-disciplinary endeavor. Data collection often requires expensive hardware and software solutions, which has prohibited many researchers from pursuing this path. Foundation models require careful architecture design and post-training strategies. And deploying models on edge devices demands hardware-aware optimizations to achieve real-time performance.This tutorial bridges these gaps by providing a hands-on, end-to-end walkthrough of the full Physical AI stack. By the end, attendees will understand the high-level frameworks, tools, and open-source community activities around robotics, embedded devices, and model training, enabling researchers, industry partners, and communities worldwide to improve collaborations in this complex and growing field.