Diffusion-based framework for weakly-supervised temporal action localization

被引:0
|
作者
Zou, Yuanbing [1 ]
Zhao, Qingjie [1 ]
Sarker, Prodip Kumar [1 ]
Li, Shanshan [1 ]
Wang, Lei [2 ]
Liu, Wangwang [2 ]
机构
[1] School of Computer Science and Technology, Beijing Institute of Technology, Beijing,100081, China
[2] Beijing Institute of Control Engineering, Beijing,100190, China
关键词
Adversarial machine learning - Contrastive Learning - Federated learning - Semantics - Semi-supervised learning;
D O I
10.1016/j.patcog.2024.111207
中图分类号
学科分类号
摘要
Weakly supervised temporal action localization aims to localize action instances with only video-level supervision. Due to the absence of frame-level annotation supervision, how effectively separate action snippets and backgrounds from semantically ambiguous features becomes an arduous challenge for this task. To address this issue from a generative modeling perspective, we propose a novel diffusion-based network with two stages. Firstly, we design a local masking mechanism module to learn the local semantic information and generate binary masks at the early stage, which (1) are used to perform action-background separation and (2) serve as pseudo-ground truth required by the diffusion module. Then, we propose a diffusion module to generate high-quality action predictions under the pseudo-ground truth supervision in the second stage. In addition, we further optimize the new-refining operation in the local masking module to improve the operation efficiency. The experimental results demonstrate that the proposed method achieves a promising performance on the publicly available mainstream datasets THUMOS14 and ActivityNet. The code is available at https://github.com/Rlab123/action_diff. © 2024
引用
收藏
相关论文
共 50 条
  • [41] Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization
    Zhou, Jianxiong
    Wu, Ying
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6017 - 6026
  • [42] Action Completeness Modeling with Background Aware Networks for Weakly-Supervised Temporal Action Localization
    Moniruzzaman, Md
    Yin, Zhaozheng
    He, Zhihai
    Qin, Ruwen
    Leu, Ming C.
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2166 - 2174
  • [43] Fusion detection network with discriminative enhancement for weakly-supervised temporal action localization
    Liu, Yuanyuan
    Zhu, Hong
    Ren, Haohao
    Shi, Jing
    Wang, Dong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [44] PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization
    Rizve, Mamshad Nayeem
    Mittal, Gaurav
    Yu, Ye
    Hall, Matthew
    Sajeev, Sandra
    Shah, Mubarak
    Chen, Mei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22992 - 23002
  • [45] Multi-Hierarchical Category Supervision for Weakly-Supervised Temporal Action Localization
    Li, Guozhang
    Li, Jie
    Wang, Nannan
    Ding, Xinpeng
    Li, Zhifeng
    Gao, Xinbo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 9332 - 9344
  • [46] Weakly-Supervised Temporal Action Localization by Inferring Salient Snippet-Feature
    Yun, Wulian
    Qi, Mengshi
    Wang, Chuanming
    Ma, Huadong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6908 - 6916
  • [47] GRAPH REGULARIZATION NETWORK WITH SEMANTIC AFFINITY FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION
    Park, Jungin
    Lee, Jiyoung
    Jeon, Sangryul
    Kim, Seungryong
    Sohn, Kwanghoon
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3701 - 3705
  • [48] A Novel Action Saliency and Context-Aware Network for Weakly-Supervised Temporal Action Localization
    Zhao, Yibo
    Zhang, Hua
    Gao, Zan
    Gao, Wenjie
    Wang, Meng
    Chen, Shengyong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8253 - 8266
  • [49] RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization
    Pardo, Alejandro
    Alwassel, Humam
    Heilbron, Fabian Caba
    Thabet, Ali
    Ghanem, Bernard
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3318 - 3327
  • [50] Spatial–temporal correlations learning and action-background jointed attention for weakly-supervised temporal action localization
    Huifen Xia
    Yongzhao Zhan
    Keyang Cheng
    Multimedia Systems, 2022, 28 : 1529 - 1541