Skip to yearly menu bar Skip to main content


Exploring Synthetic data as an Enterprise Capability for Training and Validating CV Systems

Nathan Kundtz · Matt Robinson · Dan Hedges

East 18


With the rise of edge computing, increase in remote sensing information, and ubiquitous adoption of computer vision systems throughout retail and manufacturing markets, organizations are increasingly relying on the accuracy and reliably of training Artificial Intelligence and Machine Learning systems to analyze and extract information from data captured using physical sensors and sensor platforms. Real data sets often fail to capture rare events or assets, are inaccurately labeled, and the collection of real sensor data can have cost, privacy, security, and safety issues.

Synthetic data offers the opportunity to design and label datasets for specific algorithmic training needs. Synthetic imagery designed to emulate ground-based video systems or remotely sensed satellite imagery, for example, can be generated to show real world locations populated with objects that are hard to find or that don’t yet exist. Accurately labeled, simulated datasets can be created to fit a wide range of potential real-world scenarios in which AI/ML systems will be deployed, thereby enabling teams to train and test these systems before being deployed in production environments.

This tutorial will include an introduction to creating, using, and iterating on synthetic data using the open synthetic data platform. We will also feature a demonstration using NVIDIA Omniverse Replicator in the AWS cloud. The tutorial will define physics-based synthetic data, discuss differences with Generative AI, and introduce concepts for designing synthetic data.

Chat is not available.