Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

被引:8
|
作者
Li, Zhi [1 ]
He, Lu [2 ]
Xu, Huijuan [3 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Tencent Amer, Palo Alto, CA USA
[3] Penn State Univ, University Pk, PA USA
来源
关键词
Fine-grained; Weakly-supervised; Temporal action detection; Atomic actions;
D O I
10.1007/978-3-031-20080-9_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action understanding has evolved into the era of fine granularity, as most human behaviors in real life have only minor differences. To detect these fine-grained actions accurately in a label-efficient way, we tackle the problem of weakly-supervised fine-grained temporal action detection in videos for the first time. Without the careful design to capture subtle differences between fine-grained actions, previous weakly-supervised models for general action detection cannot perform well in the fine-grained setting. We propose to model actions as the combinations of reusable atomic actions which are automatically discovered from data through self-supervised clustering, in order to capture the commonality and individuality of fine-grained actions. The learnt atomic actions, represented by visual concepts, are further mapped to fine and coarse action labels leveraging the semantic label hierarchy. Our approach constructs a visual representation hierarchy of four levels: clip level, atomic action level, fine action class level and coarse action class level, with supervision at each level. Extensive experiments on two large-scale fine-grained video datasets, FineAction and FineGym, show the benefit of our proposed weakly-supervised model for fine-grained action detection, and it achieves state-of-the-art results.
引用
收藏
页码:567 / 584
页数:18
相关论文
共 50 条
  • [1] Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
    Gao, Junyu
    Chen, Mengyuan
    Xu, Changsheng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19967 - 19977
  • [2] AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos
    Shou, Zheng
    Gao, Hang
    Zhang, Lei
    Miyazawa, Kazuyuki
    Chang, Shih-Fu
    [J]. COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 162 - 179
  • [3] Fine-grained Analysis of Cyberbullying using Weakly-Supervised Topic Models
    Zhang, Yue
    Ramesh, Arti
    [J]. 2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 504 - 513
  • [4] Skin Lesion Classification Using Weakly-supervised Fine-grained Method
    Xue, Xi
    Kamata, Sei-ichiro
    Luo, Daming
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9083 - 9090
  • [5] A Saliency-based Weakly-supervised Network for Fine-Grained Image Categorization
    Han, Yawen
    Meng, Fang
    [J]. 2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 270 - 274
  • [6] Weakly-Supervised Learning for Fine-Grained Emotion Recognition Using Physiological Signals
    Zhang, Tianyi
    El Ali, Abdallah
    Wang, Chen
    Hanjalic, Alan
    Cesar, Pablo
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2304 - 2322
  • [7] Fine-grained Action Detection in Untrimmed Surveillance Videos
    Aakur, Sathyanarayanan
    Sawyer, Daniel
    Sarkar, Sudeep
    [J]. 2019 IEEE WINTER APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2019, : 38 - 40
  • [8] Weakly-Supervised Visual Instrument-Playing Action Detection in Videos
    Liu, Jen-Yu
    Yang, Yi-Hsuan
    Jeng, Shyh-Kang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 887 - 901
  • [9] Weakly-Supervised Fine-Grained Event Recognition on Social Media Texts for Disaster Management
    Yao, Wenlin
    Zhang, Cheng
    Saravanan, Shiva
    Huang, Ruihong
    Mostafavi, Ali
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 532 - 539
  • [10] Hand Detection and Tracking in Videos for Fine-Grained Action Recognition
    Do, Nga H.
    Yanai, Keiji
    [J]. COMPUTER VISION - ACCV 2014 WORKSHOPS, PT I, 2015, 9008 : 19 - 34