Skip to yearly menu bar Skip to main content


Poster

PhaseScene : Dynamic Scene Generation with Phase-Specific Action Modeling for Embodied AI

Sangmin Lee · Sungyong Park · Heewon Kim


Abstract:

Creating robotic manipulation datasets is traditionally labor-intensive and expansive, requiring extensive manual effort. To alleviate this problem, we introduce PhaseScene, which generates realistic and diverse dynamic scenes (or robotic manipulation data) from text instructions for Embodied AI. PhaseScene employs a phase-specific data representation by dividing dynamic scenes into static environments and robot movements. Each phase utilizes a diffusion-based method to generate phase-specific data, incorporating data refinement and augmentation techniques. Our experiments demonstrate that PhaseScene outperforms human creation by about 20 times faster speed, 1.84 times accuracy, and 28% higher action diversity based on standard metrics. Additionally, the generated scenes enable accurate agent training with an average success rate improvement of 7.96% for PerAct and 11.23% for PerAct-PSA.

Live content is unavailable. Log in and register to view live content