Poster
Reconstructing People, Places, and Cameras
Lea Müller · Hongsuk Choi · Anthony Zhang · Brent Yi · Jitendra Malik · Angjoo Kanazawa
We introduce Humans and Structure from Motion'', a novel approach for reconstructing multiple people within a metric world coordinate system from a sparse set of images capturing a scene. Our method jointly estimates human body pose, shape, camera positions, and scene structure, capturing the spatial relationships among people and their location in the environment. Unlike existing methods that require calibrated setups, our approach operates with minimal constraints by leveraging the strength of both human body priors and data-driven SfM. By leveraging multi-view geometry, our method is the first work that effectively recovers humans and scene structure without assumptions about human-scene contact. We evaluate our approach on two challenging benchmarks, EgoHumans and EgoExo4D, demonstrating significant improvements in human location estimation within the world coordinate frame (3.51m to 1.04m and 2.9m to 0.56m respectively). Notably, our results also reveal that incorporating human data in the classical SfM task improves camera pose estimation (RRA@15: 0.74 to 0.89 in EgoHumans), when multiple humans are used for correspondence. We will release our code and data.
Live content is unavailable. Log in and register to view live content