DCAN: Improving Temporal Action Detection via Dual Context Aggregation

被引:0
|
作者
Chen, Guo [1 ]
Zheng, Yin-Dong [1 ]
Wang, Limin [1 ]
Lu, Tong [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal action detection aims to locate the boundaries of action in the video. The current method based on boundary matching enumerates and calculates all possible boundary matchings to generate proposals. However, these methods neglect the long-range context aggregation in boundary prediction. At the same time, due to the similar semantics of adjacent matchings, local semantic aggregation of densely-generated matchings cannot improve semantic richness and discrimination. In this paper, we propose the end-to-end proposal generation method named Dual Context Aggregation Network (DCAN) to aggregate context on two levels, namely, boundary level and proposal level, for generating high-quality action proposals, thereby improving the performance of temporal action detection. Specifically, we design the Multi-Path Temporal Context Aggregation (MTCA) to achieve smooth context aggregation on boundary level and precise evaluation of boundaries. For matching evaluation, Coarse-to-Fine Matching (CFM) is designed to aggregate context on the proposal level and refine the matching map from coarse to fine. We conduct extensive experiments on ActivityNet v1.3 and THUMOS-14. DCAN obtains an average mAP of 35.39% on ActivityNet v1.3 and reaches mAP 54.1% at IoU@0.5 on THUMOS-14, which demonstrates DCAN can generate high-quality proposals and achieve state-of-the-art performance. We release the code at https://github.com/cg1177/DCAN.
引用
收藏
页码:248 / 257
页数:10
相关论文
共 50 条
  • [1] GLFormer: Global and Local Context Aggregation Network for Temporal Action Detection
    He, Yilong
    Zhong, Yong
    Wang, Lishun
    Dang, Jiachen
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [2] Temporal Context Aggregation Network for Temporal Action Proposal Refinement
    Qing, Zhiwu
    Su, Haisheng
    Gan, Weihao
    Wang, Dongliang
    Wu, Wei
    Wang, Xiang
    Qiao, Yu
    Yan, Junjie
    Gao, Changxin
    Sang, Nong
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 485 - 494
  • [3] MBGNet:Multi-branch boundary generation network with temporal context aggregation for temporal action detection
    Pan, Xiaoying
    Zhang, Nijuan
    Xie, Hewei
    Li, Shoukun
    Feng, Tong
    [J]. APPLIED INTELLIGENCE, 2024, 54 (19) : 9045 - 9066
  • [4] Online Action Tube Detection via Resolving the Spatio-temporal Context Pattern
    Huang, Jingjia
    Li, Nannan
    Zhong, Jiaxing
    Li, Thomas H.
    Li, Ge
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 993 - 1001
  • [5] LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation
    Zhou, Keyi
    Li, Li
    Zhou, Wengang
    Wang, Yonghui
    Feng, Hao
    Li, Houqiang
    [J]. arXiv,
  • [6] Improving Action Recognition via Temporal and Complementary Learning
    Elmadany, Nour Eldin
    He, Yifeng
    Guan, Ling
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (03)
  • [7] Temporal Context Enhanced Feature Aggregation for Video Object Detection
    He, Fei
    Gao, Naiyu
    Li, Qiaozhe
    Du, Senyao
    Zhao, Xin
    Huang, Kaiqi
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10941 - 10948
  • [8] Local and global context cooperation for temporal action detection
    Wu, Lanxi
    Xu, Luhui
    [J]. Multimedia Systems, 2024, 30 (06)
  • [9] CAA: Candidate-Aware Aggregation for Temporal Action Detection
    Ren, Yifan
    Xu, Xing
    Shen, Fumin
    Yao, Yazhou
    Lu, Huimin
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4930 - 4938
  • [10] SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION
    Zhang, Hongcheng
    Zhao, Xu
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2180 - 2184