Global Information Thresholding for Sufficient and Necessary Circuits
Abstract
We study the problem of extracting causal circuits-small edge-level subgraphs inside a trained network that are sufficient on their own and necessary to the model’s behavior under explicit error control. Prior work largely optimizes observational rankings or applies ad-hoc sparsification, which can sever paths, ignore inhibitory edges, and admit ``ghost" components that fail under intervention. We recast circuit discovery as information-constrained selection rather than ranking: a single global threshold chooses edges by their marginal contribution, combined with a null hypothesis-based statistical threshold to control family-wise errors. Edge scores are computed by rank-consistent attribution aligned to the task metric, stabilized with Fisher-diagonal variance normalization, projected to an edge coordinate system that preserves paths, and enforced with hard gates for interventional semantics. We propose an evaluation protocol that prioritizes sufficiency/necessity (CPR, CMD), editability, error rates, and standard ranking metrics. The result is a small, path-faithful circuit with reproducible selection criteria. Our motivation is to replace visually appealing heatmaps with interventional guarantees and explicit error control.