FontCrafter: High-Fidelity Element-Driven Artistic Font Creation with Visual In-Context Generation
Abstract
Artistic font generation aims to synthesize stylized glyphs based on a reference style. However, existing approaches suffer from limited style diversity and coarse control. In this work, we explore the potential of element-driven artistic font generation. Elements are the fundamental visual units of a font, serving as reference images for the desired style. Conceptually, we categorize elements into object elements (e.g., flowers or stones) with distinct structures and amorphous elements (e.g., flames or clouds) with unstructured textures.We introduce FontCrafter, an element-driven framework for font creation, and construct a large-scale dataset, ElementFont, which comprises a diverse set of element types and high-quality glyph images. However, achieving high-fidelity reconstruction of both the texture and structure of reference elements remains challenging. To address this, we propose an in-context generation strategy that treats element images as visual context and uses an inpainting model to transfer element styles into glyph regions at the pixel level.To further control glyph shapes, we design a lightweight Context-aware Mask Adapter (CMA) that injects shape information while maintaining style consistency. Moreover, a training-free attention redirection mechanism enables region-aware style control and suppresses stroke hallucination. Extensive experiments demonstrate that FontCrafter achieves strong zero-shot generation performance, especially in preserving the structural and textural fidelity, while supporting flexible controls, such as style mixture. The model and dataset will be made publicly available.