ProcessMaker: A Generalized Process Visualization Framework with Adaptive Sequence Steps on Diffusion Transformers
Abstract
Procedural sequence generation aims to create intermediate images through multi-step processes, which is applied in industrial design, educational tutorial, and creative content inspiration. However, existing methods often focus on a specific domain or initialize several expert networks for different domains, which face three challenges. First, the poor generalization to unseen domains. Second, the parameter redundancy due to multiple expert networks.Third, the difficulty in adaptively determining the number of generation steps for different processes.To address these challenges, we propose ProcessMaker, a novel framework that harnesses the inherent generalization capabilities in Diffusion Transformers (DiTs) for procedural sequence generation. Concretely, we introduce three key innovations: (1) Self-supervised Representation Alignment to explore the generalized ability for unseen processes. (2) Sparse Masks for different domains without additional expert networks. (3) A sliding window strategy, which dynamically accommodates the generation steps based on the process complexity. Extensive experiments validate that our ProcessMaker achieves procedural sequence generation with generalization ability and adaptive steps, while using only 7.3% trainable parameters compared with the state-of-the-art method.