Dynamic interactive learning network for audio-visual event localization

被引：0

作者：

Chen, Jincai ^{[1
,2
]}

Liang, Han ^{[1
,3
]}

Wang, Ruili ^{[3
]}

Zeng, Jiangfeng ^{[4
,5
]}

Lu, Ping ^{[2
]}

机构：

[1] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Peoples R China

[2] Huazhong Univ Sci & Technol, Inst Nat & Math Sci, Wuhan, Peoples R China

[3] Massey Univ, Inst Nat & Math Sci, Auckland, New Zealand

[4] Cent China Normal Univ, Sch Informat Management, Wuhan, Peoples R China

[5] Ctr Data Governance & Intelligent Decis Making Hub, Wuhan, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 24期

基金：

中国国家自然科学基金;

关键词：

Audio-visual event localization; Dynamic fusion; Attention mechanism; Difference loss; CROSS-MODAL ATTENTION;

D O I：

10.1007/s10489-023-05146-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Audio-visual event (AVE) localization aims to detect whether an event exists in each video segment and predict its category. Only when the event is audible and visible can it be recognized as an AVE. However, sometimes the information from auditory and visual modalities is asymmetrical in a video sequence, leading to incorrect predictions. To address this challenge, we introduce a dynamic interactive learning network designed to dynamically explore the intra- and inter-modal relationships depending on the other modality for better AVE localization. Specifically, our approach involves a dynamic fusion attention of intra- and inter-modalities module, enabling the auditory and visual modalities to focus more on regions deemed informative by the other modality while focusing less on regions that the other modality considers noise. In addition, we introduce an audio-visual difference loss to reduce the distance between auditory and visual representations. Our proposed method has been demonstrated to have superior performance by extensive experimental results on the AVE dataset. The source code will be available at https://github.com/hanliang/DILN.

引用

页码：30431 / 30442

页数：12

共 50 条

[1] Dynamic interactive learning network for audio-visual event localization
Jincai Chen
Han Liang
Ruili Wang
Jiangfeng Zeng
Ping Lu
[J]. Applied Intelligence, 2023, 53 : 30431 - 30442
[2] Dual Perspective Network for Audio-Visual Event Localization
Rao, Varshanth
Khalil, Md Ibrahim
Li, Haoda
Dai, Peng
Lu, Juwei
[J]. COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 689 - 704
[3] Learning Event-Specific Localization Preferences for Audio-Visual Event Localization
Ge, Shiping
Jiang, Zhiwei
Yin, Yafeng
Wang, Cong
Cheng, Zifeng
Gu, Qing
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3446 - 3454
[4] Dense Modality Interaction Network for Audio-Visual Event Localization
Liu, Shuo
Quan, Weize
Wang, Chaoqun
Liu, Yuan
Liu, Bin
Yan, Dong-Ming
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2734 - 2748
[5] Audio-Visual Event Localization in Unconstrained Videos
Tian, Yapeng
Shi, Jing
Li, Bochen
Duan, Zhiyao
Xu, Chenliang
[J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268
[6] BI-DIRECTIONAL MODALITY FUSION NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION
Liu, Shuo
Quan, Weize
Liu, Yuan
Yan, Dong-Ming
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4868 - 4872
[7] Audio-Visual Event Localization by Learning Spatial and Semantic Co-Attention
Xue, Cheng
Zhong, Xionghu
Cai, Minjie
Chen, Hao
Wang, Wenwu
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 418 - 429
[8] Dual Attention Matching for Audio-Visual Event Localization
Wu, Yu
Zhu, Linchao
Yan, Yan
Yang, Yi
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6301 - 6309
[9] Semantic and Relation Modulation for Audio-Visual Event Localization
Wang, Hao
Zha, Zheng-Jun
Li, Liang
Chen, Xuejin
Luo, Jiebo
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7711 - 7725
[10] Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization
Xuan, Hanyu
Zhang, Zhenyu
Chen, Shuo
Yang, Jian
Yan, Yan
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 279 - 286

← 1 2 3 4 5 →