Non-Local Temporal Difference Network for Temporal Action Detection

被引:3
|
作者
He, Yilong [1 ,2 ]
Han, Xiao [1 ,2 ]
Zhong, Yong [1 ,2 ]
Wang, Lishun [1 ,2 ]
机构
[1] Chinese Acad Sci, Chengdu Inst Comp Applicat, Chengdu 610081, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China
关键词
temporal action detection; deep learning; convolutional neural networks; computer vision; video understanding;
D O I
10.3390/s22218396
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
As an important part of video understanding, temporal action detection (TAD) has wide application scenarios. It aims to simultaneously predict the boundary position and class label of every action instance in an untrimmed video. Most of the existing temporal action detection methods adopt a stacked convolutional block strategy to model long temporal structures. However, most of the information between adjacent frames is redundant, and distant information is weakened after multiple convolution operations. In addition, the durations of action instances vary widely, making it difficult for single-scale modeling to fit complex video structures. To address this issue, we propose a non-local temporal difference network (NTD), including a chunk convolution (CC) module, a multiple temporal coordination (MTC) module, and a temporal difference (TD) module. The TD module adaptively enhances the motion information and boundary features with temporal attention weights. The CC module evenly divides the input sequence into N chunks, using multiple independent convolution blocks to simultaneously extract features from neighboring chunks. Therefore, it realizes the information delivered from distant frames while avoiding trapping into the local convolution. The MTC module designs a cascade residual architecture, which realizes the multiscale temporal feature aggregation without introducing additional parameters. The NTD achieves a state-of-the-art performance on two large-scale datasets, 36.2% mAP@avg and 71.6% mAP@0.5 on ActivityNet-v1.3 and THUMOS-14, respectively.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] TEDdet: Temporal Feature Exchange and Difference Network for Online Real-Time Action Detection
    Liu, Yu
    Yang, Fan
    Ginhac, Dominique
    [J]. IEEE ACCESS, 2022, 10 : 37870 - 37881
  • [22] Non-Local Spatial and Temporal Attention Network for Video-Based Person Re-Identification
    Liu, Zheng
    Du, Feixiang
    Li, Wang
    Liu, Xu
    Zou, Qiang
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (15):
  • [23] Nonlinear Hyperbolic Equations with Dissipative Temporal and Spatial Non-Local Memory
    Mosna, F.
    Necas, J.
    [J]. ZEITSCHRIFT FUR ANALYSIS UND IHRE ANWENDUNGEN, 1999, 18 (04): : 939 - 951
  • [24] A New Temporal Deconvolutional Pyramid Network for Action Detection
    Ji, Xiangli
    Luo, Guibo
    Zhu, Yuesheng
    [J]. COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 696 - 711
  • [25] Temporal adaptive feature pyramid network for action detection
    Xiang, Xuezhi
    Yin, Hang
    Qiao, Yulong
    El Saddik, Abdulmotaleb
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 240
  • [26] Progressive Boundary Refinement Network for Temporal Action Detection
    Liu, Qinying
    Wang, Zilei
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11612 - 11619
  • [27] Use of Multi-Temporal SAR Non-Local Mean Filtering Operations for Change Detection Analyses
    Pepe, Antonio
    [J]. 2022 IEEE 21ST MEDITERRANEAN ELECTROTECHNICAL CONFERENCE (IEEE MELECON 2022), 2022, : 616 - 620
  • [28] Boundary graph convolutional network for temporal action detection
    Chen, Yaosen
    Guo, Bing
    Shen, Yan
    Wang, Wei
    Lu, Weichen
    Suo, Xinhua
    [J]. IMAGE AND VISION COMPUTING, 2021, 109
  • [29] ON THE DIFFERENCE BETWEEN LOCAL AND NON-LOCAL FIELDS
    YUKAWA, H
    [J]. PROGRESS OF THEORETICAL PHYSICS, 1951, 6 (01): : 133 - 134
  • [30] Temporal Interval Regression Network for Video Action Detection
    Wang, Qing
    Qing, Laiyun
    Miao, Jun
    Duan, Lijuan
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 258 - 268