DCAN: Improving Temporal Action Detection via Dual Context Aggregation

被引：0

作者：

Chen, Guo ^{[1
]}

Zheng, Yin-Dong ^{[1
]}

Wang, Limin ^{[1
]}

Lu, Tong ^{[1
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Temporal action detection aims to locate the boundaries of action in the video. The current method based on boundary matching enumerates and calculates all possible boundary matchings to generate proposals. However, these methods neglect the long-range context aggregation in boundary prediction. At the same time, due to the similar semantics of adjacent matchings, local semantic aggregation of densely-generated matchings cannot improve semantic richness and discrimination. In this paper, we propose the end-to-end proposal generation method named Dual Context Aggregation Network (DCAN) to aggregate context on two levels, namely, boundary level and proposal level, for generating high-quality action proposals, thereby improving the performance of temporal action detection. Specifically, we design the Multi-Path Temporal Context Aggregation (MTCA) to achieve smooth context aggregation on boundary level and precise evaluation of boundaries. For matching evaluation, Coarse-to-Fine Matching (CFM) is designed to aggregate context on the proposal level and refine the matching map from coarse to fine. We conduct extensive experiments on ActivityNet v1.3 and THUMOS-14. DCAN obtains an average mAP of 35.39% on ActivityNet v1.3 and reaches mAP 54.1% at IoU@0.5 on THUMOS-14, which demonstrates DCAN can generate high-quality proposals and achieve state-of-the-art performance. We release the code at https://github.com/cg1177/DCAN.

引用

页码：248 / 257

页数：10

共 50 条

[1] GLFormer: Global and Local Context Aggregation Network for Temporal Action Detection
He, Yilong
Zhong, Yong
Wang, Lishun
Dang, Jiachen
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (17):
[2] Temporal Context Aggregation Network for Temporal Action Proposal Refinement
Qing, Zhiwu
Su, Haisheng
Gan, Weihao
Wang, Dongliang
Wu, Wei
Wang, Xiang
Qiao, Yu
Yan, Junjie
Gao, Changxin
Sang, Nong
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 485 - 494
[3] MBGNet:Multi-branch boundary generation network with temporal context aggregation for temporal action detection
Pan, Xiaoying
Zhang, Nijuan
Xie, Hewei
Li, Shoukun
Feng, Tong
[J]. APPLIED INTELLIGENCE, 2024, 54 (19) : 9045 - 9066
[4] Online Action Tube Detection via Resolving the Spatio-temporal Context Pattern
Huang, Jingjia
Li, Nannan
Zhong, Jiaxing
Li, Thomas H.
Li, Ge
[J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 993 - 1001
[5] LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation
Zhou, Keyi
Li, Li
Zhou, Wengang
Wang, Yonghui
Feng, Hao
Li, Houqiang
[J]. arXiv,
[6] Improving Action Recognition via Temporal and Complementary Learning
Elmadany, Nour Eldin
He, Yifeng
Guan, Ling
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (03)
[7] Temporal Context Enhanced Feature Aggregation for Video Object Detection
He, Fei
Gao, Naiyu
Li, Qiaozhe
Du, Senyao
Zhao, Xin
Huang, Kaiqi
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10941 - 10948
[8] Local and global context cooperation for temporal action detection
Wu, Lanxi
Xu, Luhui
[J]. Multimedia Systems, 2024, 30 (06)
[9] CAA: Candidate-Aware Aggregation for Temporal Action Detection
Ren, Yifan
Xu, Xing
Shen, Fumin
Yao, Yazhou
Lu, Huimin
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4930 - 4938
[10] SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION
Zhang, Hongcheng
Zhao, Xu
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2180 - 2184

← 1 2 3 4 5 →