Computer Vision at Scale: Multi-Camera Tracking, Calibration, and Event Detection for Checkout-Free Retail
Abstract
Checkout-free retail represents one of the most challenging real-world computer vision deployments, requiring reliable performance across hundreds of sites and millions of interactions. This tutorial bridges academic research and production deployment by framing checkout-free retail as a canonical large-scale multi-camera vision systems problem. The tutorial focuses on three foundational pillars—presented as generalizable computer vision problems rather than application-specific solutions. Each component is discussed in terms of its underlying formulations, scalability constraints, failure modes, and design tradeoffs that transfer directly to autonomous driving, smart spaces, sports analytics, warehouses, and urban sensing. Automatic Multi-Camera CalibrationContinuous online estimation addressing drift and partial failures using deep learning and conventional CV pipelines.Real-Time Multi-Camera TrackingGlobal data association under asynchronous, unreliable observations via integer programming and graph-based formulations.Structured Event DetectionInference from partial visual evidence with edge computing, Kubernetes deployment, and CPU/GPU/TPU optimization. A central theme is how infrastructure constraints—including limited bandwidth, latency requirements, camera reliability, and edge computing budgets—fundamentally shape algorithmic and architectural decisions. Attendees will learn how classical and modern deep learning techniques are adapted for continuous online operation, partial observability, and 99.9%+ system reliability.