Deep cascaded action attention network for weakly-supervised temporal action localization

被引:0
|
作者
Hui-fen Xia
Yong-zhao Zhan
机构
[1] Jiangsu University,School of Computer Science and Communication Engineering
[2] Changzhou Vocational Institute of Mechatronic Technology,undefined
[3] Jiangsu Engineering Research Center of Big Data Ubiquitous Perception and Intelligent Agriculture Applications,undefined
来源
关键词
Weakly-supervised; Temporal action localization; Deep cascaded action attention; Non-action suppression;
D O I
暂无
中图分类号
学科分类号
摘要
Weakly-supervised temporal action localization (W-TAL) is to locate the boundaries of action instances and classify them in an untrimmed video, which is a challenging task due to only video-level labels during training. Existing methods mainly focus on the most discriminative action snippets of a video by using top-k multiple instance learning (MIL), and ignore the usage of less discriminative action snippets and non-action snippets. This makes the localization performance improve limitedly. In order to mine the less discriminative action snippets and distinguish the non-action snippets better in a video, a novel method based on deep cascaded action attention network is proposed. In this method, the deep cascaded action attention mechanism is presented to model not only the most discriminative action snippets, but also different levels of less discriminative action snippets by introducing threshold erasing, which ensures the completeness of action instances. Besides, the entropy loss for non-action is introduced to restrict the activations of non-action snippets for all action categories, which are generated by aggregating the bottom-k activation scores along the temporal dimension. Thereby, the action snippets can be distinguished from non-action snippets better, which is beneficial to the separation of action and non-action snippets and enables the action instances more accurate. Ultimately, our method can facilitate more precise action localization. Extensive experiments conducted on THUMOS14 and ActivityNet1.3 datasets show that our method outperforms state-of-the-art methods at several t-IoU thresholds.
引用
收藏
页码:29769 / 29787
页数:18
相关论文
共 50 条
  • [21] Spatial–temporal correlations learning and action-background jointed attention for weakly-supervised temporal action localization
    Huifen Xia
    Yongzhao Zhan
    Keyang Cheng
    Multimedia Systems, 2022, 28 : 1529 - 1541
  • [22] A Novel Action Saliency and Context-Aware Network for Weakly-Supervised Temporal Action Localization
    Zhao, Yibo
    Zhang, Hua
    Gao, Zan
    Gao, Wenjie
    Wang, Meng
    Chen, Shengyong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8253 - 8266
  • [23] Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization
    Zhou, Jianxiong
    Wu, Ying
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6017 - 6026
  • [24] Fusion detection network with discriminative enhancement for weakly-supervised temporal action localization
    Liu, Yuanyuan
    Zhu, Hong
    Ren, Haohao
    Shi, Jing
    Wang, Dong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [25] GRAPH REGULARIZATION NETWORK WITH SEMANTIC AFFINITY FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION
    Park, Jungin
    Lee, Jiyoung
    Jeon, Sangryul
    Kim, Seungryong
    Sohn, Kwanghoon
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3701 - 3705
  • [26] Multi-Dimensional Attention With Similarity Constraint for Weakly-Supervised Temporal Action Localization
    Chen, Zhengyan
    Liu, Hong
    Zhang, Linlin
    Liao, Xin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4349 - 4360
  • [27] Weakly-supervised temporal action localization using multi-branch attention weighting
    Liu, Mengxue
    Li, Wenjing
    Ge, Fangzhen
    Gao, Xiangjun
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [28] Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization
    Lee, Pilhyeon
    Byun, Hyeran
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13628 - 13637
  • [29] Spatial-temporal correlations learning and action-background jointed attention for weakly-supervised temporal action localization
    Xia, Huifen
    Zhan, Yongzhao
    Cheng, Keyang
    MULTIMEDIA SYSTEMS, 2022, 28 (04) : 1529 - 1541
  • [30] AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos
    Shou, Zheng
    Gao, Hang
    Zhang, Lei
    Miyazawa, Kazuyuki
    Chang, Shih-Fu
    COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 162 - 179