Action-aware Masking Network with Group-based Attention for Temporal Action Localization

被引:3
|
作者
Kang, Tae-Kyung [1 ]
Lee, Gun-Hee [2 ]
Jin, Kyung-Min [1 ]
Lee, Seong-Whan [1 ]
机构
[1] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea
[2] Korea Univ, Dept Comp Sci & Engn, Seoul, South Korea
关键词
D O I
10.1109/WACV56688.2023.00600
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal Action Localization (TAL) is a significant and challenging task that searches for subtle human activities in an untrimmed video. To extract snippet-level video features, existing TAL methods commonly use video encoders pre-trained on short-video classification datasets. However, the snippet-level features can incur ambiguity between consecutive frames due to short and poor temporal information, disrupting the precise prediction of action instances. Several methods incorporating temporal relations have been proposed to mitigate this problem; however, they still suffer from poor video features. To address this issue, we propose a novel temporal action localization framework called an Action-aware Masking Network (AMNet). Our method simultaneously refines video features using action-aware attention and considers inherent temporal relations using self-attention and cross-attention mechanisms. First, we present an Action Masking Encoder (AME) that generates an action-aware mask to represent positive characteristics, which is then used to refine snippet-level features to be more salient around actions. Second, we design a Group Attention Module (GAM), which models relations of temporal information and exchanges mutual information by dividing the features into two groups, i.e., long and short-groups. Extensive experiments and ablation studies on two primary benchmark datasets demonstrate the effectiveness of AMNet, and our method achieves state-of-the-art performances on THUMOS-14 and ActivityNet1.3.
引用
收藏
页码:6047 / 6056
页数:10
相关论文
共 50 条
  • [31] TVNet: Temporal Voting Network for Action Localization
    Wang, Hanyuan
    Damen, Dima
    Mirmehdi, Majid
    Perrett, Toby
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 550 - 558
  • [32] ATSN: Attention-Based Temporal Segment Network for Action Recognition
    Sun, Yun-lei
    Zhang, Da-lin
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2019, 26 (06): : 1664 - 1669
  • [33] Centerness-Aware Network for Temporal Action Proposal
    Liu, Yuan
    Chen, Jingyuan
    Chen, Xinpeng
    Deng, Bing
    Huang, Jianqiang
    Hua, Xian-Sheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 5 - 16
  • [34] Action-Aware Embedding Enhancement for Image-Text Retrieval
    Li, Jiangtong
    Niu, Li
    Zhang, Liqing
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1323 - 1331
  • [35] MTSN: Multiscale Temporal Similarity Network for Temporal Action Localization
    Jin, Xiaodong
    Zhang, Taiping
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2573 - 2581
  • [36] Spatial-temporal saliency action mask attention network for action recognition
    Jiang, Min
    Pan, Na
    Kong, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71
  • [37] Adaptable Action-Aware Vital Models for Personalized Intelligent Patient Monitoring
    Wu, Kai
    Chen, Ee Heng
    Hao, Xing
    Wirth, Felix
    Vitanova, Keti
    Lange, Rudiger
    Burschka, Darius
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
  • [38] Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
    Huang, Linjiang
    Wang, Liang
    Li, Hongsheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7982 - 7991
  • [39] Discriminative Action Snippet Propagation Network for Weakly Supervised Temporal Action Localization
    Dang, Yuanjie
    Huang, Chunxia
    Chen, Peng
    Zhao, Dongdong
    Gao, Nan
    Liang, Ronghua
    Huan, Ruohong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)
  • [40] Affect, (group-based) emotions, and climate change action
    Harth, Nicole S.
    CURRENT OPINION IN PSYCHOLOGY, 2021, 42 : 140 - 144