Rethinking Glyph Spatial Information in Font Generation
Abstract
Few-shot Font Generation (FFG) aims to create a complete font from a limited number of references, offering significant practical value. However, existing methods neglect glyph spatial information, which leads to two critical limitations. At the pipeline level, distorted rendering introduces spatial bias, impairing vectorization and dataset quality, and this problem is compounded by the lack of unified standards, which undermines a unified benchmark. At the model level, the implicit coupling of shape and position hinders fine-grained optimization and generalization. We address these challenges in the context of Chinese font generation, where glyph complexity demands superior model capability. Consequently, we first propose a Spatial-Preserving Rendering (SPR) protocol, which eliminates spatial bias and enables accurate vectorization. Alongside, we release an OFL-licensed Chinese font dataset to establish a unified benchmark. Then, technically, we propose GlyphSpatialNet, a two-stage framework to explicitly model glyph spatial information in pixel space. In first stage, we design a Shape-Position Decoupling (SPD) architecture and a Gradient Broadcasting Module (GBM) to achieve font style transfer in low resolution. In second stage, we design Style Detail Enhancement (SDE), which refines the style details for high resolution outputs. Extensive experiments demonstrate the effectiveness of our approach. Code and dataset are provided in the supplementary materials.