Frame-Level Label Refinement for Skeleton-Based Weakly-Supervised Action Recognition

被引:0
|
作者
Yu, Qing [1 ]
Fujiwara, Kent [2 ]
机构
[1] Univ Tokyo, Tokyo, Japan
[2] LINE Corp, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, skeleton-based action recognition has achieved remarkable performance in understanding human motion from sequences of skeleton data, which is an important medium for synthesizing realistic human movement in various applications. However, existing methods assume that each action clip is manually trimmed to contain one specific action, which requires a significant amount of effort for an-notation. To solve this problem, we consider a novel problem of skeleton-based weakly-supervised temporal action localization (S-WTAL), where we need to recognize and localize human action segments in untrimmed skeleton videos given only the video-level labels. Although this task is challenging due to the sparsity of skeleton data and the lack of contextual clues from interaction with other objects and the environment, we present a frame-level label refinement frame-work based on a spatio-temporal graph convolutional network (ST-GCN) to overcome these difficulties. We use multiple instance learning (MIL) with video-level labels to generate the frame-level predictions. Inspired by advances in handling the noisy label problem, we introduce a label cleaning strategy of the frame-level pseudo labels to guide the learning pro-cess. The network parameters and the frame-level predictions are alternately updated to obtain the final results. We extensively evaluate the effectiveness of our learning approach on skeleton-based action recognition benchmarks. The state-of-the-art experimental results demonstrate that the proposed method can recognize and localize action segments of the skeleton data.
引用
收藏
页码:3322 / 3330
页数:9
相关论文
共 50 条
  • [1] Frame-level refinement networks for skeleton-based gait recognition
    Wang, Likai
    Chen, Jinyan
    Liu, Yuxin
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 222
  • [2] Skeleton-Based Gait Recognition via Robust Frame-Level Matching
    Choi, Seokeon
    Kim, Jonghee
    Kim, Wonjun
    Kim, Changick
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2019, 14 (10) : 2577 - 2592
  • [3] RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization
    Pardo, Alejandro
    Alwassel, Humam
    Heilbron, Fabian Caba
    Thabet, Ali
    Ghanem, Bernard
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3318 - 3327
  • [4] Learning frame-level affinity with video-level labels for weakly supervised temporal action detection
    Li, Bairong
    Zhu, Yuesheng
    Liu, Ruixin
    Weng, Zhenyu
    NEUROCOMPUTING, 2021, 463 : 109 - 121
  • [5] Temporal Refinement Graph Convolutional Network for Skeleton-Based Action Recognition
    Zhuang T.
    Qin Z.
    Ding Y.
    Deng F.
    Chen L.
    Qin Z.
    Raymond Choo K.-K.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (04): : 1586 - 1598
  • [6] Pose Refinement Graph Convolutional Network for Skeleton-Based Action Recognition
    Li, Shijie
    Yi, Jinhui
    Abu Farha, Yazan
    Gall, Juergen
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02): : 1028 - 1035
  • [7] A Lightweight Hierarchical Model with Frame-Level Joints Adaptive Graph Convolution for Skeleton- Based Action Recognition
    Jiang, Yujian
    Yang, Xue
    Liu, Jingyu
    Zhang, Junming
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [8] Cross-Scale Spatiotemporal Refinement Learning for Skeleton-Based Action Recognition
    Zhang, Yu
    Sun, Zhonghua
    Dai, Meng
    Feng, Jinchao
    Jia, Kebin
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 441 - 445
  • [9] Revisiting Skeleton-based Action Recognition
    Duan, Haodong
    Zhao, Yue
    Chen, Kai
    Lin, Dahua
    Dai, Bo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2959 - 2968
  • [10] Temporal-masked skeleton-based action recognition with supervised contrastive learning
    Zhao, Zhifeng
    Chen, Guodong
    Lin, Yuxiang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2267 - 2275