Dual Attention Matching for Audio-Visual Event Localization

被引:131
|
作者
Wu, Yu [1 ,2 ]
Zhu, Linchao [2 ]
Yan, Yan [3 ]
Yang, Yi [2 ]
机构
[1] Baidu Res, Beijing, Peoples R China
[2] Univ Technol Sydney, ReLER, Sydney, NSW, Australia
[3] Texas State Univ, San Marcos, TX USA
关键词
D O I
10.1109/ICCV.2019.00639
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the audio-visual event localization problem. This task is to localize a visible and audible event in a video. Previous methods first divide a video into short segments, and then fuse visual and acoustic features at the segment level. The duration of these segments is usually short, making the visual and acoustic feature of each segment possibly not well aligned. Direct concatenation of the two features at the segment level can be vulnerable to a minor temporal misalignment of the two signals. We propose a Dual Attention Matching (DAM) module to cover a longer video duration for better high-level event information modeling, while the local temporal information is attained by the global cross-check mechanism. Our premise is that one should watch the whole video to understand the high-level event, while shorter segments should be checked in detail for localization. Specifically, the global feature of one modality queries the local feature in the other modality in a bi-directional way. With temporal co-occurrence encoded between auditory and visual signals, DAM can be readily applied in various audio-visual event localization tasks, e.g., cross-modality localization, supervised event localization. Experiments on the AVE dataset show our method outperforms the state-of-the-art by a large margin.
引用
收藏
页码:6301 / 6309
页数:9
相关论文
共 50 条
  • [1] Dual Perspective Network for Audio-Visual Event Localization
    Rao, Varshanth
    Khalil, Md Ibrahim
    Li, Haoda
    Dai, Peng
    Lu, Juwei
    [J]. COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 689 - 704
  • [2] Temporal Cross-Modal Attention for Audio-Visual Event Localization
    Nagasaki Y.
    Hayashi M.
    Kaneko N.
    Aoki Y.
    [J]. Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2022, 88 (03): : 263 - 268
  • [3] Masked co-attention model for audio-visual event localization
    Liu, Hengwei
    Gu, Xiaodong
    [J]. APPLIED INTELLIGENCE, 2024, 54 (02) : 1691 - 1705
  • [4] Masked co-attention model for audio-visual event localization
    Hengwei Liu
    Xiaodong Gu
    [J]. Applied Intelligence, 2024, 54 : 1691 - 1705
  • [5] Audio-Visual Event Localization by Learning Spatial and Semantic Co-Attention
    Xue, Cheng
    Zhong, Xionghu
    Cai, Minjie
    Chen, Hao
    Wang, Wenwu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 418 - 429
  • [6] Audio-Visual Event Localization in Unconstrained Videos
    Tian, Yapeng
    Shi, Jing
    Li, Bochen
    Duan, Zhiyao
    Xu, Chenliang
    [J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268
  • [7] Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization
    Xuan, Hanyu
    Zhang, Zhenyu
    Chen, Shuo
    Yang, Jian
    Yan, Yan
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 279 - 286
  • [8] Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
    Duan, Bin
    Tang, Hao
    Wang, Wei
    Zong, Ziliang
    Yang, Guowei
    Yan, Yan
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 4012 - 4021
  • [9] Learning Event-Specific Localization Preferences for Audio-Visual Event Localization
    Ge, Shiping
    Jiang, Zhiwei
    Yin, Yafeng
    Wang, Cong
    Cheng, Zifeng
    Gu, Qing
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3446 - 3454
  • [10] Semantic and Relation Modulation for Audio-Visual Event Localization
    Wang, Hao
    Zha, Zheng-Jun
    Li, Liang
    Chen, Xuejin
    Luo, Jiebo
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7711 - 7725