ShapeAR: Generating Editable Shape Layers via Autoregressive Diffusion
Abstract
We present ShapeAR, a novel autoregressive latent diffusion framework that decomposes raster images into editable, artist-like vector shape layers. Unlike conventional raster-to-SVG methods that rely on boundary tracing or joint path optimization, ShapeAR generates non-overlapping RGBA shape layers directly in latent space via flow-matching diffusion. To scale generation to complex scenes with many shapes, we formulate the process autoregressively, conditioning each step on both the input image (global context) and the partial composition of previously generated layers (local context). In addition, we propose geometry-aware evaluation metrics that quantify the aesthetic and structural quality of the generated shapes, enabling more rigorous assessment beyond pixel-level reconstruction. ShapeAR achieves cleaner decompositions and more coherent vector layers.