Diffusion-based framework for weakly-supervised temporal action localization

被引:0
|
作者
Zou, Yuanbing [1 ]
Zhao, Qingjie [1 ]
Sarker, Prodip Kumar [1 ]
Li, Shanshan [1 ]
Wang, Lei [2 ]
Liu, Wangwang [2 ]
机构
[1] School of Computer Science and Technology, Beijing Institute of Technology, Beijing,100081, China
[2] Beijing Institute of Control Engineering, Beijing,100190, China
关键词
Adversarial machine learning - Contrastive Learning - Federated learning - Semantics - Semi-supervised learning;
D O I
10.1016/j.patcog.2024.111207
中图分类号
学科分类号
摘要
Weakly supervised temporal action localization aims to localize action instances with only video-level supervision. Due to the absence of frame-level annotation supervision, how effectively separate action snippets and backgrounds from semantically ambiguous features becomes an arduous challenge for this task. To address this issue from a generative modeling perspective, we propose a novel diffusion-based network with two stages. Firstly, we design a local masking mechanism module to learn the local semantic information and generate binary masks at the early stage, which (1) are used to perform action-background separation and (2) serve as pseudo-ground truth required by the diffusion module. Then, we propose a diffusion module to generate high-quality action predictions under the pseudo-ground truth supervision in the second stage. In addition, we further optimize the new-refining operation in the local masking module to improve the operation efficiency. The experimental results demonstrate that the proposed method achieves a promising performance on the publicly available mainstream datasets THUMOS14 and ActivityNet. The code is available at https://github.com/Rlab123/action_diff. © 2024
引用
收藏
相关论文
共 50 条
  • [31] CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
    Zhang, Can
    Cao, Meng
    Yang, Dongming
    Chen, Jie
    Zou, Yuexian
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16005 - 16014
  • [32] Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization
    Zhang, Chengwei
    Xu, Yunlu
    Cheng, Zhanzhan
    Niu, Yi
    Pu, Shiliang
    Wu, Fei
    Zou, Futai
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 738 - 746
  • [33] Weakly-supervised Temporal Action Localization with Adaptive Clustering and Refining Network
    Ren, Hao
    Ran, Wu
    Liu, Xingson
    Ren, Haoran
    Lu, Hong
    Zhang, Rui
    Jin, Cheng
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1008 - 1013
  • [34] Dual-Evidential Learning for Weakly-supervised Temporal Action Localization
    Chen, Mengyuan
    Gao, Junyu
    Yang, Shicai
    Xu, Changsheng
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 192 - 208
  • [35] Learning Background Suppression Model for Weakly-supervised Temporal Action Localization
    Liu, Mengxue
    Gao, Xiangjun
    Ge, Fangzhen
    Liu, Huaiyu
    Li, Wenjing
    IAENG International Journal of Computer Science, 2021, 48 (04):
  • [36] Unleashing the Potential of Adjacent Snippets for Weakly-supervised Temporal Action Localization
    Liu, Qinying
    Wang, Zilei
    Chen, Ruoxi
    Li, Zhilin
    Proceedings - IEEE International Conference on Multimedia and Expo, 2023, 2023-July : 1032 - 1037
  • [37] Unleashing the Potential of Adjacent Snippets for Weakly-supervised Temporal Action Localization
    Liu, Qinying
    Wang, Zilei
    Chen, Ruoxi
    Li, Zhilin
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1032 - 1037
  • [38] Weakly-supervised Action Localization with Background Modeling
    Phuc Xuan Nguyen
    Ramanan, Deva
    Fowlkes, Charless C.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5501 - 5510
  • [39] Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
    Gao, Junyu
    Chen, Mengyuan
    Xu, Changsheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19967 - 19977
  • [40] W-ART: ACTION RELATION TRANSFORMER FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION
    Li, Mengzhu
    Wu, Hongjun
    Liu, Yongcheng
    Liu, Hongzhe
    Xu, Cheng
    Li, Xuewei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2195 - 2199