CADFS: A Big CAD Program Dataset and Framework for Computer-Aided Design with Large Language Models
Abstract
We introduce CADFS, a data-centric framework that enables large vision-language models to generate complex CAD design histories. Existing generative CAD systems are restricted to sketch-and-extrude operations due to simplified representations and limited datasets. We address this by introducing a FeatureScript-based representation and constructing a dataset of 450k real-world CAD models spanning 15 modeling operations, obtained via a new pipeline that reconstructs clean, executable FeatureScript programs and provides multimodal annotations. Fine-tuning a VLM on this representation yields state-of-the-art results in text-conditioned CAD generation and image-based reconstruction, producing more accurate, diverse, and feature-rich designs than prior frameworks. Ablations show that FeatureScript, the expanded operation set, and representation-aligned textual descriptions all significantly improve performance. Our framework substantially broadens the complexity and realism achievable in generative CAD.