Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification

被引:1
|
作者
Moufidi, Abderrazzaq [1 ,2 ]
Rousseau, David [2 ]
Rasti, Pejman [1 ,2 ]
机构
[1] ESAIP, Ctr Etud & Rech Aide Decis CERADE, 18 Rue 8 Mai 1945, F-49124 St Barthelemy Anjou, France
[2] Univ Angers, Lab Angevin Rech Ingn Syst LARIS, UMR INRAe IRHS, 62 Ave Notre Dame Lac, F-49000 Angers, France
关键词
depth images; lip identification; speaker identification; late fusion; multimodality; spatiotemporal;
D O I
10.3390/s23135890
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Multimodal deep learning, in the context of biometrics, encounters significant challenges due to the dependence on long speech utterances and RGB images, which are often impractical in certain situations. This paper presents a novel solution addressing these issues by leveraging ultrashort voice utterances and depth videos of the lip for person identification. The proposed method utilizes an amalgamation of residual neural networks to encode depth videos and a Time Delay Neural Network architecture to encode voice signals. In an effort to fuse information from these different modalities, we integrate self-attention and engineer a noise-resistant model that effectively manages diverse types of noise. Through rigorous testing on a benchmark dataset, our approach exhibits superior performance over existing methods, resulting in an average improvement of 10%. This method is notably efficient for scenarios where extended utterances and RGB images are unfeasible or unattainable. Furthermore, its potential extends to various multimodal applications beyond just person identification.
引用
下载
收藏
页数:13
相关论文
共 50 条
  • [41] Person image generation with attention-based injection network
    Liu, Meichen
    Wang, Kejun
    Ji, Ruihang
    Ge, Shuzhi Sam
    Chen, Jing
    NEUROCOMPUTING, 2021, 460 : 345 - 359
  • [42] Residual Attention-based Fusion for Video Classification
    Pouyanfar, Samira
    Wang, Tianyi
    Chen, Shu-Ching
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 478 - 480
  • [43] Attention-based Surgical Phase Boundaries Detection in Laparoscopic Videos
    Namazi, Babak
    Sankaranarayanan, Ganesh
    Devarajan, Venkat
    2019 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2019), 2019, : 577 - 583
  • [44] Domain adaptive attention-based dropout for one-shot person re-identification
    Xulin Song
    Zhong Jin
    International Journal of Machine Learning and Cybernetics, 2022, 13 : 255 - 268
  • [45] Domain adaptive attention-based dropout for one-shot person re-identification
    Song, Xulin
    Jin, Zhong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (01) : 255 - 268
  • [46] Attention-based Model with Attribute Classification for Cross-domain Person Re-identification
    Xu, Simin
    Luo, Lingkun
    Hu, Shiqiang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9149 - 9155
  • [47] A Multi-Scale Graph Attention-Based Transformer for Occluded Person Re-Identification
    Ma, Ming
    Wang, Jianming
    Zhao, Bohan
    Applied Sciences (Switzerland), 14 (18):
  • [48] Attention-Based Grasp Detection With Monocular Depth Estimation
    Xuan Tan, Phan
    Hoang, Dinh-Cuong
    Nguyen, Anh-Nhat
    Nguyen, Van-Thiep
    Vu, Van-Duc
    Nguyen, Thu-Uyen
    Hoang, Ngoc-Anh
    Phan, Khanh-Toan
    Tran, Duc-Thanh
    Vu, Duy-Quang
    Ngo, Phuc-Quan
    Duong, Quang-Tri
    Ho, Ngoc-Trung
    Tran, Cong-Trinh
    Duong, Van-Hiep
    Mai, Anh-Truong
    IEEE ACCESS, 2024, 12 : 65041 - 65057
  • [49] Marfusion: An Attention-Based Multimodal Fusion Model for Human Activity Recognition in Real-World Scenarios
    Zhao, Yunhan
    Guo, Siqi
    Chen, Zeqi
    Shen, Qiang
    Meng, Zhengyuan
    Xu, Hao
    APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [50] Attention in Multimodal Neural Networks for Person Re-identification
    Leibolle, Aske R.
    Krogh, Benjamin
    Nasrollahi, Kamal
    Moeslund, Thomas B.
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 292 - 300