WorldGen: From Text to Traversable and Interactive 3D Worlds
Abstract
We introduce WorldGen, a method for generating large, fully formed, navigable 3D worlds from a single text prompt. Existing approaches to 3D scene generation often trade off scene diversity, completeness, and correctness in different ways. We push this envelope by producing large scenes explicitly decomposed into individual, high-quality 3D meshes, making them compatible with standard game engines. Our approach first uses a language-driven procedural generator to lay out the scene's basic volumes and navigable regions. An image generator then establishes the scene's theme, style, and details. Next, we obtain a high-quality, compositional 3D reconstruction of the planned scene. This step first uses an image-to-3D model to perform a holistic reconstruction that implicitly determines the shape and location of all scene objects, accounting for context and navigability. The reconstruction is then decomposed into individual entities, which are regenerated at higher resolution, synthesizing additional details with guidance from the image generator. We ablate key design choices and compare qualitatively against existing scene generators, showing that our design addresses many of their common challenges.