Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification

被引:1
|
作者
Moufidi, Abderrazzaq [1 ,2 ]
Rousseau, David [2 ]
Rasti, Pejman [1 ,2 ]
机构
[1] ESAIP, Ctr Etud & Rech Aide Decis CERADE, 18 Rue 8 Mai 1945, F-49124 St Barthelemy Anjou, France
[2] Univ Angers, Lab Angevin Rech Ingn Syst LARIS, UMR INRAe IRHS, 62 Ave Notre Dame Lac, F-49000 Angers, France
关键词
depth images; lip identification; speaker identification; late fusion; multimodality; spatiotemporal;
D O I
10.3390/s23135890
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Multimodal deep learning, in the context of biometrics, encounters significant challenges due to the dependence on long speech utterances and RGB images, which are often impractical in certain situations. This paper presents a novel solution addressing these issues by leveraging ultrashort voice utterances and depth videos of the lip for person identification. The proposed method utilizes an amalgamation of residual neural networks to encode depth videos and a Time Delay Neural Network architecture to encode voice signals. In an effort to fuse information from these different modalities, we integrate self-attention and engineer a noise-resistant model that effectively manages diverse types of noise. Through rigorous testing on a benchmark dataset, our approach exhibits superior performance over existing methods, resulting in an average improvement of 10%. This method is notably efficient for scenarios where extended utterances and RGB images are unfeasible or unattainable. Furthermore, its potential extends to various multimodal applications beyond just person identification.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] GRPAFusion: A Gradient Residual and Pyramid Attention-Based Multiscale Network for Multimodal Image Fusion
    Wang, Jinxin
    Xi, Xiaoli
    Li, Dongmei
    Li, Fang
    Zhang, Guanxin
    [J]. ENTROPY, 2023, 25 (01)
  • [22] Multimodal-Attention Fusion for the Detection of Questionable Content in Videos
    Morales, Arnold
    Baharlouei, Elaheh
    Solorio, Thamar
    Escalante, Hugo Jair
    [J]. PATTERN RECOGNITION, MCPR 2024, 2024, 14755 : 188 - 199
  • [23] Attention-Based Multimodal Fusion for Estimating Human Emotion in Real-World HRI
    Li, Yuanchao
    Zhao, Tianyu
    Shen, Xun
    [J]. HRI'20: COMPANION OF THE 2020 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2020, : 340 - 342
  • [24] Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features
    Mamieva, Dilnoza
    Abdusalomov, Akmalbek Bobomirzaevich
    Kutlimuratov, Alpamis
    Muminov, Bahodir
    Whangbo, Taeg Keun
    [J]. SENSORS, 2023, 23 (12)
  • [25] Multimodal attention-based transformer for video captioning
    Hemalatha Munusamy
    Chandra Sekhar C
    [J]. Applied Intelligence, 2023, 53 : 23349 - 23368
  • [26] Multimodal attention-based transformer for video captioning
    Munusamy, Hemalatha
    Sekhar, C. Chandra
    [J]. APPLIED INTELLIGENCE, 2023, 53 (20) : 23349 - 23368
  • [27] Attention-based Natural Language Person Retrieval
    Zhou, Tao
    Chen, Muhao
    Yu, Jie
    Terzopoulos, Demetri
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 27 - 34
  • [28] Score Level Fusion Based Multimodal Biometric Identification (Fingerprint & Voice)
    Elmir, Youssef
    Elberrichi, Zakaria
    Adjoudj, Reda
    [J]. 2012 6TH INTERNATIONAL CONFERENCE ON SCIENCES OF ELECTRONICS, TECHNOLOGIES OF INFORMATION AND TELECOMMUNICATIONS (SETIT), 2012, : 146 - 150
  • [29] Attention-based LSTM with Semantic Consistency for Videos Captioning
    Guo, Zhao
    Gao, Lianli
    Song, Jingkuan
    Xu, Xing
    Shao, Jie
    Shen, Heng Tao
    [J]. MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 357 - 361
  • [30] Attention-based Multi-Level Fusion Network for Light Field Depth Estimation
    Chen, Jiaxin
    Zhang, Shuo
    Lin, Youfang
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1009 - 1017