Cross Time-Frequency Transformer for Temporal Action Localization

被引:1
|
作者
Yang, Jin [1 ]
Wei, Ping [1 ]
Zheng, Nanning [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intell, Xian 710049, Shaanxi, Peoples R China
关键词
Time-frequency analysis; Feature extraction; Transformers; Location awareness; Logic gates; Task analysis; Discrete wavelet transforms; Temporal action localization; transformer; cross time-frequency features; NETWORK;
D O I
10.1109/TCSVT.2023.3326692
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Most modern approaches in temporal action localization (TAL) mainly focus on time domain information, while neglecting the advantages of information from other domains. How to effectively utilize information from different domains and their interactions in a reasonable manner has been an attractive yet challenging issue in TAL. In this paper, we propose a novel cross time-frequency Transformer model (TFFormer) for TAL. A dual-branch network architecture is designed to capture the time and frequency features at multiple scales, using the multi-scale transformer in the time branch and the DB1 Discrete Wavelet Transform (DWT) in the frequency branch. To fuse these features from different domains, we propose a cross time-frequency attention mechanism that includes a time pathway and a frequency pathway, enhancing the interaction between the temporal and frequency features. Furthermore, a gated control mechanism is designed to aggregate features from different scales, characterizing the respective contributions of features at different scales. We also design a new regression loss function for locating the time boundaries. Extensive experiments were carried out on four challenging benchmark datasets, including two third-person datasets and two first-person datasets. The proposed method achieves impressive results on these datasets. Specifically, TFFormer achieves an average mAP of 23.2% on Ego4D and 25.6% on EPIC-Kitchens 100, which outperform previous state-of-the-arts by a large margin. It also obtains competitive results on ActivityNet v1.3 and THUMOS14, with an average mAP of 36.2% and 67.8%. We also conducted extensive ablation studies to validate the effectiveness of each component in the proposed method.
引用
收藏
页码:4625 / 4638
页数:14
相关论文
共 50 条
  • [1] Gabor Transform for the Time-Frequency Localization of Impulse Faults in a Transformer
    Vanamadevi, N.
    Santhi, S.
    Arivamudhan, M.
    [J]. ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY ALGORITHMS IN ENGINEERING SYSTEMS, VOL 1, 2015, 324 : 645 - 656
  • [2] Temporal Deformable Transformer for Action Localization
    Wang, Haoying
    Wei, Ping
    Liu, Meiqin
    Zheng, Nanning
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 563 - 575
  • [3] Time-frequency vibration analysis of the transformer construction
    Kornatowski, Eugeniusz
    [J]. PRZEGLAD ELEKTROTECHNICZNY, 2012, 88 (11B): : 268 - 271
  • [4] MODELING BEATS AND DOWNBEATS WITH A TIME-FREQUENCY TRANSFORMER
    Hung, Yun-Ning
    Wang, Ju-Chiang
    Song, Xuchen
    Lu, Wei-Tsung
    Won, Minz
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 401 - 405
  • [5] Efficient Time-Frequency Localization of a Signal
    Chand, Satish
    [J]. INTERNATIONAL JOURNAL OF DIGITAL MULTIMEDIA BROADCASTING, 2014, 2014
  • [6] Fourier Interpolation and Time-Frequency Localization
    Aleksei Kulikov
    [J]. Journal of Fourier Analysis and Applications, 2021, 27
  • [7] Fourier Interpolation and Time-Frequency Localization
    Kulikov, Aleksei
    [J]. JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS, 2021, 27 (03)
  • [8] Time-frequency analysis of localization operators
    Cordero, E
    Gröchenig, K
    [J]. JOURNAL OF FUNCTIONAL ANALYSIS, 2003, 205 (01) : 107 - 131
  • [9] Time-frequency correlates of attention to action
    Capilla, A.
    Maestu, F.
    Campo, P.
    Fernandez, S.
    Castillo, E. M.
    Gonzalez-Marques, J.
    Ortiz, T.
    [J]. JOURNAL OF PSYCHOPHYSIOLOGY, 2006, 20 (03) : 232 - 232
  • [10] An Adaptive Dual Selective Transformer for Temporal Action Localization
    Li, Qiang
    Zu, Guang
    Xu, Hui
    Kong, Jun
    Zhang, Yanni
    Wang, Jianzhong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7398 - 7412