LAM: Language Articulated Object Modelers
Abstract
We introduce LAM, a system that explores the collaboration of large-language mod-els and vision-language models to generate articulated objects from text prompts.Our approach differs from previous methods that either rely on input visual structure(e.g., an image) or assemble articulated models from pre-built assets. In contrast,we formulate articulated object generation as a unified code generation task, wheregeometry and articulations can be co-designed from scratch. Given an input text,LAM coordinates a team of specialized modules to generate code to represent thedesired articulated object procedurally. The LAM first reasons about the hierarchi-cal structure of parts (links) with Link Designer, then writes code, compiles it, anddebugs it with Geometry & Articulation Coders and self-corrects with Geometry& Articulation Checkers. The code serves as a structured and interpretable bridgebetween individual links, ensuring correct relationships among them. Representingeverything with code allows the system to determine appropriate joint types andcalculate their exact placements more reliably. Experiments demonstrate the powerof leveraging code as a generative medium within an agentic system, showcasingits effectiveness in automatically constructing complex articulated objects.