NOISE-TOLERANT AUDIO-VISUAL ONLINE PERSON VERIFICATION USING AN ATTENTION-BASED NEURAL NETWORK FUSION

被引:0
|
作者
Shon, Suwon [1 ]
Oh, Tae-Hyun [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
person verification; recognition; multi-modal; cross-modal; attention; missing data; RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a multi-modal online person verification system using both speech and visual signals. Inspired by neuroscientific findings on the association of voice and face, we propose an attention-based end-to-end neural network that learns multi-sensory association for the task of person verification. The attention mechanism in our proposed network learns to conditionally select a salient modality between speech and facial representations that provides a balance between complementary inputs. By virtue of this capability, the network is robust to missing or corrupted data from either modality. In the VoxCeleb2 dataset, we show that our method performs favorably against competing multi-modal methods. Even for extreme cases of large corruption or missing data on either modality, our method demonstrates robustness over other unimodal methods.
引用
收藏
页码:3995 / 3999
页数:5
相关论文
共 50 条
  • [41] Topical-Relevance Detection Using Attention-Based Neural Network
    Li, Xia
    Yang, Zhanyuan
    Chen, Minping
    Feng, Wenhe
    [J]. 2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 373 - 377
  • [42] Mineral prospectivity mapping using attention-based convolutional neural network
    Li, Quanke
    Chen, Guoxiong
    Luo, Lei
    [J]. ORE GEOLOGY REVIEWS, 2023, 156
  • [43] A Novel Attention-based Global and Local Information Fusion Neural Network for Group Recommendation
    Zhang, Song
    Zheng, Nan
    Wang, Dan-Li
    [J]. MACHINE INTELLIGENCE RESEARCH, 2022, 19 (04) : 331 - 346
  • [44] Environment sound classification using an attention-based residual neural network
    Tripathi, Achyut Mani
    Mishra, Aakansha
    [J]. NEUROCOMPUTING, 2021, 460 : 409 - 423
  • [45] Attention-based sentiment analysis using convolutional and recurrent neural network
    Usama, Mohd
    Ahmad, Belal
    Song, Enmin
    Hossain, M. Shamim
    Alrashoud, Mubarak
    Muhammad, Ghulam
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 113 : 571 - 578
  • [46] A Sentence Summarizer using Recurrent Neural Network and Attention-Based Encoder
    Kuremoto, Takashi
    Tsuruda, Takuji
    Mabu, Shingo
    Obayashi, Masanao
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON APPLIED MATHEMATICS, MODELING AND SIMULATION (AMMS 2017), 2017, 153 : 245 - 248
  • [47] A Novel Attention-based Global and Local Information Fusion Neural Network for Group Recommendation
    Song Zhang
    Nan Zheng
    Dan-Li Wang
    [J]. Machine Intelligence Research, 2022, 19 : 331 - 346
  • [48] A Novel Attention-based Global and Local Information Fusion Neural Network for Group Recommendation
    Song Zhang
    Nan Zheng
    Dan-Li Wang
    [J]. Machine Intelligence Research, 2022, (04) : 331 - 346
  • [49] Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
    Praveen, R. Gnana
    Cardinal, Patrick
    Granger, Eric
    [J]. IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2023, 5 (03): : 360 - 373
  • [50] Multimodal audio-visual information fusion using canonical-correlated Graph Neural Network for energy-efficient speech enhancement
    Passos, Leandro A.
    Papa, Joao Paulo
    Del Ser, Javier
    Hussain, Amir
    Adeel, Ahsan
    [J]. INFORMATION FUSION, 2023, 90 : 1 - 11