End-to-End Hyper-Relational Information Extraction for Engineering Diagrams via Dynamically Tokenized Relation Transformer
Abstract
Engineering diagrams are the core carriers of technical information in industrial contexts, where the pressing demand for their digitization from industrial sectors has driven great advancements in related research domains. However, existing research still suffers from three limitations. Firstly, the detection of symbols, lines, and texts typically involves multiple independent models, resulting in cumbersome workflows. In addition, high-resolution diagrams often impose an excessive computational cost on existing models. Moreover, parsing frameworks solely based on object detection can merely localize component positions, yet fail to capture the topological connection semantics and structured knowledge among components, thus offering limited convenience for industrial applications. To address these issues, we propose an end-to-end information extraction framework based on the Dynamically Tokenized Relation Transformer (DTRT), which can dynamically reduce received image tokens, filter redundant information, and efficiently extract structural knowledge to construct hyper-relational knowledge graphs. We practiced our model on piping and instrumentation diagrams (P&IDs) and electrical diagrams (EDs): the former are widely used in chemical engineering enterprises, while the latter are employed to describe circuit systems. DTRT achieves an R@1000 accuracy of 94.84% on PIDs and R@200 accuracy of 92.52% on EDs with a significantly reduced computational cost.