Dynamic Audio-Visual Biometric Fusion for Person Recognition

被引:7
|
作者
Alsaedi, Najlaa Hindi [1 ]
Jaha, Emad Sami [1 ]
机构
[1] King Abdulaziz Univ, Fac Comp Sci & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2022年 / 71卷 / 01期
关键词
Biometrics; dynamic fusion; feature fusion; identification; mul-timodal biometrics; occluded face recognition; quality-based recognition; verification; voice recognition; FEATURE LEVEL FUSION; FACE RECOGNITION; VOICE; SCORE;
D O I
10.32604/cmc.2022.021608
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Biometric recognition refers to the process of recognizing a person's identity using physiological or behavioral modalities, such as face, voice, fingerprint, gait, etc. Such biometric modalities are mostly used in recognition tasks separately as in unimodal systems, or jointly with two or more as in multimodal systems. However, multimodal systems can usually enhance the recognition performance over unimodal systems by integrating the biometric data of multiple modalities at different fusion levels. Despite this enhancement, in real-life applications some factors degrade multimodal systems' performance, such as occlusion, face poses, and noise in voice data. In this paper, we propose two algorithms that effectively apply dynamic fusion at feature level based on the data quality of multimodal biometrics. The proposed algorithms attempt to minimize the negative influence of confusing and low-quality features by either exclusion or weight reduction to achieve better recognition performance. The proposed dynamic fusion was achieved using face and voice biometrics, where face features were extracted using principal component analysis (PCA), and Gabor filters separately, whilst voice features were extracted using Mel-Frequency Cepstral Coefficients (MFCCs). Here, the facial data quality assessment of face images is mainly based on the existence of occlusion, whereas the assessment of voice data quality is substantially based on the calculation of signal to noise ratio (SNR) as per the existence of noise. To evaluate the performance of the proposed algorithms, several experiments were conducted using two combinations of three different databases, AR database, and the extended Yale Face Database B for face images, in addition to VOiCES database for voice data. The obtained results show that both proposed dynamic fusion algorithms attain improved performance and offer more advantages in identification and verification over not only the standard unimodal algorithms but also the multimodal algorithms using standard fusion methods.
引用
收藏
页码:1283 / 1311
页数:29
相关论文
共 50 条
  • [1] Biometric person authentication with liveness detection based on audio-visual fusion
    Chetty, Girija
    Wagner, Michael
    [J]. INTERNATIONAL JOURNAL OF BIOMETRICS, 2009, 1 (04) : 463 - 478
  • [2] Incremental Audio-Visual Fusion for Person Recognition in Earthquake Scene
    You, Sisi
    Zuo, Yukun
    Yao, Hantao
    Xu, Changsheng
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (02)
  • [3] Audio-visual biometric recognition by vector quantization
    Das, Amitava
    Ghosh, Prasanta
    [J]. 2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 166 - +
  • [4] Intramodal and intermodal fusion for audio-visual biometric authentication
    Cheung, MC
    Mak, MW
    Kung, SY
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 25 - 28
  • [5] Audio-Visual Sensor Fusion Framework Using Person Attributes Robust to Missing Visual Modality for Person Recognition
    John, Vijay
    Kawanishi, Yasutomo
    [J]. MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 523 - 535
  • [6] Scene recognition with audio-visual sensor fusion
    Devicharan, D
    Mehrotra, KG
    Mohan, CK
    Varshney, PK
    Zuo, L
    [J]. Multisensor, Multisource Information Fusion: Architectures, Algorithms and Applications 2005, 2005, 5813 : 201 - 210
  • [7] Multifactor fusion for audio-visual speaker recognition
    Chetty, Girija
    Tran, Dat
    [J]. LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 70 - +
  • [8] Bimodal fusion in audio-visual speech recognition
    Zhang, XZ
    Mersereau, RM
    Clements, M
    [J]. 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
  • [9] A Deep Neural Network for Audio-Visual Person Recognition
    Alam, Mohammad Rafiqul
    Bennamoun, Mohammed
    Togneri, Roberto
    Sohel, Ferdous
    [J]. 2015 IEEE 7TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS 2015), 2015,
  • [10] Multi-Feature Audio-Visual Person Recognition
    Das, Amitav
    Manyam, Ohil K.
    Tapaswi, Makarand
    [J]. 2008 IEEE WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2008, : 227 - 232