GLFormer: Global and Local Context Aggregation Network for Temporal Action Detection

被引:1
|
作者
He, Yilong [1 ,2 ]
Zhong, Yong [1 ,2 ]
Wang, Lishun [1 ,2 ]
Dang, Jiachen [1 ,2 ]
机构
[1] Chinese Acad Sci, Chengdu Inst Comp Applicat, Chengdu 610081, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 17期
关键词
temporal action detection; computer vision; deep learning; artificial intelligence; HISTOGRAMS; FLOW;
D O I
10.3390/app12178557
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
As the core component of video analysis, Temporal Action Localization (TAL) has experienced remarkable success. However, some issues are not well addressed. First, most of the existing methods process the local context individually, without explicitly exploiting the relations between features in an action instance as a whole. Second, the duration of different actions varies widely; thus, it is difficult to choose the proper temporal receptive field. To address these issues, this paper proposes a novel network, GLFormer, which can aggregate short, medium, and long temporal contexts. Our method consists of three independent branches with different ranges of attention, and these features are then concatenated along the temporal dimension to obtain richer features. One is multi-scale local convolution (MLC), which consists of multiple 1D convolutions with varying kernel sizes to capture the multi-scale context information. Another is window self-attention (WSA), which tries to explore the relationship between features within the window range. The last is global attention (GA), which is used to establish long-range dependencies across the full sequence. Moreover, we design a feature pyramid structure to be compatible with action instances of various durations. GLFormer achieves state-of-the-art performance on two challenging video benchmarks, THUMOS14 and ActivityNet 1.3. Our performance is 67.2% and 54.5% AP@0.5 on the datasets THUMOS14 and ActivityNet 1.3, respectively.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Local and global context cooperation for temporal action detection
    Wu, Lanxi
    Xu, Luhui
    [J]. Multimedia Systems, 2024, 30 (06)
  • [2] Temporal Context Aggregation Network for Temporal Action Proposal Refinement
    Qing, Zhiwu
    Su, Haisheng
    Gan, Weihao
    Wang, Dongliang
    Wu, Wei
    Wang, Xiang
    Qiao, Yu
    Yan, Junjie
    Gao, Changxin
    Sang, Nong
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 485 - 494
  • [3] MBGNet:Multi-branch boundary generation network with temporal context aggregation for temporal action detection
    Pan, Xiaoying
    Zhang, Nijuan
    Xie, Hewei
    Li, Shoukun
    Feng, Tong
    [J]. APPLIED INTELLIGENCE, 2024, 54 (19) : 9045 - 9066
  • [4] EAR: Efficient action recognition with local-global temporal aggregation
    Zhang, Can
    Zou, Yuexian
    Chen, Guang
    Gan, Lei
    [J]. IMAGE AND VISION COMPUTING, 2021, 116 (116)
  • [5] DCAN: Improving Temporal Action Detection via Dual Context Aggregation
    Chen, Guo
    Zheng, Yin-Dong
    Wang, Limin
    Lu, Tong
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 248 - 257
  • [6] Non-Local Temporal Difference Network for Temporal Action Detection
    He, Yilong
    Han, Xiao
    Zhong, Yong
    Wang, Lishun
    [J]. SENSORS, 2022, 22 (21)
  • [7] Joint Learning of Local and Global Context for Temporal Action Proposal Generation
    Lin, Tianwei
    Zhao, Xu
    Su, Haisheng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4899 - 4912
  • [8] Local and Global Context Reasoning for Spatio-Temporal Action Localization
    Ando, Ryuhei
    Babazaki, Yasunori
    Takahashi, Katsuhiko
    [J]. ADVANCES IN VISUAL COMPUTING, ISVC 2023, PT I, 2023, 14361 : 147 - 159
  • [9] Local–Global Transformer Neural Network for temporal action segmentation
    Xiaoyan Tian
    Ye Jin
    Xianglong Tang
    [J]. Multimedia Systems, 2023, 29 : 615 - 626
  • [10] SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION
    Zhang, Hongcheng
    Zhao, Xu
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2180 - 2184