Poster
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding Download PDF
Guangda Ji · Silvan Weder · Francis Engelmann · Marc Pollefeys · Hermann Blum
Neural network performance scales with both model size and data volume, as shown in language and image processing. This requires scaling-friendly architectures and large datasets. While transformers have been adapted for 3D vision, a 'GPT-moment' remains elusive due to limited training data. We introduce ARKit LabelMaker, the first large-scale, real-world 3D dataset with dense semantic annotations. Specifically, we enhance ARKitScenes with automatically generated dense labels using an extended LabelMaker pipeline, tailored for large-scale pre-training. Training on this dataset improves accuracy across architectures, achieving state-of-the-art results on ScanNet and ScanNet200, with notable gains on tail classes. We compare our results with self-supervised methods and synthetic data, evaluating the effects on downstream tasks and zero-shot generalization. The dataset will be publicly available.
Live content is unavailable. Log in and register to view live content