Reducing the Label Bias for Timestamp Supervised Temporal Action Segmentation

Kaiyuan Liu · Yunheng Li · Shenglan Liu · Chenwei Tan · Zihang Shao

West Building Exhibit Halls ABC 227


Timestamp supervised temporal action segmentation (TSTAS) is more cost-effective than fully supervised counterparts. However, previous approaches suffer from severe label bias due to over-reliance on sparse timestamp annotations, resulting in unsatisfactory performance. In this paper, we propose the Debiasing-TSTAS (D-TSTAS) framework by exploiting unannotated frames to alleviate this bias from two phases: 1) Initialization. To reduce the dependencies on annotated frames, we propose masked timestamp predictions (MTP) to ensure that initialized model captures more contextual information. 2) Refinement. To overcome the limitation of the expressiveness from sparsely annotated timestamps, we propose a center-oriented timestamp expansion (CTE) approach to progressively expand pseudo-timestamp groups which contain semantic-rich motion representation of action segments. Then, these pseudo-timestamp groups and the model output are used to iteratively generate pseudo-labels for refining the model in a fully supervised setup. We further introduce segmental confidence loss to enable the model to have high confidence predictions within the pseudo-timestamp groups and more accurate action boundaries. Our D-TSTAS outperforms the state-of-the-art TSTAS method as well as achieves competitive results compared with fully supervised approaches on three benchmark datasets.

Chat is not available.