SketchRevive: Fine-Grained Pixel-to-Vector Sketch Completion with Diffusion-Prior–Guided Multimodal LLMs
Abstract
Transforming sparse, partial pixel sketches from diverse media into complete, editable vector drawings is essential yet underexplored in digital creation. Prior methods either generate from scratch or inpaint local gaps without predicting global structure, leading to coarse contours and limited detail. To address this, we introduce SketchRevive, a two‑stage framework for fine‑grained pixel‑to‑vector sketch completion that couples diffusion‑based pixel completion with MLLM‑driven refinement and vectorization to produce coherent, detail‑faithful SVG results. Specifically, we first construct a practical benchmark by augmenting stroke‑annotated sketches from paper and whiteboards. Stage I trains a diffusion model with a line‑distribution head to predict per‑pixel stroke presence, producing structural and appearance consistent completions. Stage II fine-tunes an MLLM for structure‑aware SVG vectorization with iterative refinement, optimized by instance‑level stroke attribute similarities. To align key clues e.g. spatial structure, appearance details across both stages, we introduce a diffusion-prior aggregated encoding module by injecting multi‑scale UNet features from Stage I into the MLLM’s visual embeddings and using line prediction logits for token compression to prioritize informative tokens. Experiments indicate that SketchRevive completes topology‑coherent vector outputs with high fidelity and recognizability while preserving user intent, suitable for interactive creation and artistic design.