Temporal Deformable Transformer for Action Localization

被引：0

作者：

Wang, Haoying ^{[1
]}

Wei, Ping ^{[1
]}

Liu, Meiqin ^{[1
]}

Zheng, Nanning ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI | 2023年 / 14259卷

基金：

中国国家自然科学基金;

关键词：

Temporal Action Localization; Transformer; Deformable Attention; Video Understanding;

D O I：

10.1007/978-3-031-44223-0_45

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Temporal action localization (TAL) is a challenging task that has received significant attention in video understanding. Recently, Transformer-based models have demonstrated their effectiveness in capturing contextual information and achieved outstanding performance on various TAL benchmarks. However, these methods still face challenges in computational efficiency and contextual modeling rigidity. In this paper, we propose a method to address those problems in Transformer-based models. Our model introduces a temporal deformable Transformer module and the corresponding time normalization, enabling flexible aggregation of temporal context information in videos, leading to enhanced video representations. To demonstrate the effectiveness of the proposed method, we construct a Transformer-based anchor-free model with a simple prediction head, which yields superior performance on widely used benchmarks. Specifically, it achieves an average mAP of 67.4% on THUMOS14 and an average mAP of 36.8% on ActivityNet-v1.3.

引用

页码：563 / 575

页数：13

共 50 条

[1] An Adaptive Dual Selective Transformer for Temporal Action Localization
Li, Qiang
Zu, Guang
Xu, Hui
Kong, Jun
Zhang, Yanni
Wang, Jianzhong
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7398 - 7412
[2] TEST: Temporal-spatial separated transformer for temporal action localization
Wan, Herun
Luo, Minnan
Li, Zhihui
Wang, Yang
NEUROCOMPUTING, 2025, 614
[3] A Multitemporal Scale and Spatial-Temporal Transformer Network for Temporal Action Localization
Gao, Zan
Cui, Xinglei
Zhuo, Tao
Cheng, Zhiyong
Liu, An-An
Wang, Meng
Chen, Shenyong
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2023, 53 (03) : 569 - 580
[4] Cross Time-Frequency Transformer for Temporal Action Localization
Yang, Jin
Wei, Ping
Zheng, Nanning
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4625 - 4638
[5] Gated Multi-Scale Transformer for Temporal Action Localization
Yang, Jin
Wei, Ping
Ren, Ziyang
Zheng, Nanning
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5705 - 5717
[6] Multi-granularity transformer fusion for temporal action localization
Zhang M.
Hu H.
Li Z.
Soft Computing, 2024, 28 (20) : 12377 - 12388
[7] TALLFormer: Temporal Action Localization with a Long-Memory Transformer
Cheng, Feng
Bertasius, Gedas
COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 503 - 521
[8] W-ART: ACTION RELATION TRANSFORMER FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION
Li, Mengzhu
Wu, Hongjun
Liu, Yongcheng
Liu, Hongzhe
Xu, Cheng
Li, Xuewei
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2195 - 2199
[9] Actionness-Guided Transformer for Anchor-Free Temporal Action Localization
Zhao, Peisen
Xie, Lingxi
Zhang, Ya
Tian, Qi
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 194 - 198
[10] HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization
Reza, Sakib
Zhang, Yuexi
Moghaddam, Mohsen
Camps, Octavia
COMPUTER VISION - ECCV 2024, PT XXI, 2025, 15079 : 205 - 222

← 1 2 3 4 5 →