Dynamic Audio-Visual Biometric Fusion for Person Recognition

被引：7

作者：

Alsaedi, Najlaa Hindi ^{[1
]}

Jaha, Emad Sami ^{[1
]}

机构：

[1] King Abdulaziz Univ, Fac Comp Sci & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2022年 / 71卷 / 01期

关键词：

Biometrics; dynamic fusion; feature fusion; identification; mul-timodal biometrics; occluded face recognition; quality-based recognition; verification; voice recognition; FEATURE LEVEL FUSION; FACE RECOGNITION; VOICE; SCORE;

D O I：

10.32604/cmc.2022.021608

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Biometric recognition refers to the process of recognizing a person's identity using physiological or behavioral modalities, such as face, voice, fingerprint, gait, etc. Such biometric modalities are mostly used in recognition tasks separately as in unimodal systems, or jointly with two or more as in multimodal systems. However, multimodal systems can usually enhance the recognition performance over unimodal systems by integrating the biometric data of multiple modalities at different fusion levels. Despite this enhancement, in real-life applications some factors degrade multimodal systems' performance, such as occlusion, face poses, and noise in voice data. In this paper, we propose two algorithms that effectively apply dynamic fusion at feature level based on the data quality of multimodal biometrics. The proposed algorithms attempt to minimize the negative influence of confusing and low-quality features by either exclusion or weight reduction to achieve better recognition performance. The proposed dynamic fusion was achieved using face and voice biometrics, where face features were extracted using principal component analysis (PCA), and Gabor filters separately, whilst voice features were extracted using Mel-Frequency Cepstral Coefficients (MFCCs). Here, the facial data quality assessment of face images is mainly based on the existence of occlusion, whereas the assessment of voice data quality is substantially based on the calculation of signal to noise ratio (SNR) as per the existence of noise. To evaluate the performance of the proposed algorithms, several experiments were conducted using two combinations of three different databases, AR database, and the extended Yale Face Database B for face images, in addition to VOiCES database for voice data. The obtained results show that both proposed dynamic fusion algorithms attain improved performance and offer more advantages in identification and verification over not only the standard unimodal algorithms but also the multimodal algorithms using standard fusion methods.

引用

页码：1283 / 1311

页数：29

共 50 条

[1] Biometric person authentication with liveness detection based on audio-visual fusion
Chetty, Girija
Wagner, Michael
[J]. INTERNATIONAL JOURNAL OF BIOMETRICS, 2009, 1 (04) : 463 - 478
[2] Incremental Audio-Visual Fusion for Person Recognition in Earthquake Scene
You, Sisi
Zuo, Yukun
Yao, Hantao
Xu, Changsheng
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (02)
[3] Audio-visual biometric recognition by vector quantization
Das, Amitava
Ghosh, Prasanta
[J]. 2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 166 - +
[4] Intramodal and intermodal fusion for audio-visual biometric authentication
Cheung, MC
Mak, MW
Kung, SY
[J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 25 - 28
[5] Audio-Visual Sensor Fusion Framework Using Person Attributes Robust to Missing Visual Modality for Person Recognition
John, Vijay
Kawanishi, Yasutomo
[J]. MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 523 - 535
[6] Scene recognition with audio-visual sensor fusion
Devicharan, D
Mehrotra, KG
Mohan, CK
Varshney, PK
Zuo, L
[J]. Multisensor, Multisource Information Fusion: Architectures, Algorithms and Applications 2005, 2005, 5813 : 201 - 210
[7] Multifactor fusion for audio-visual speaker recognition
Chetty, Girija
Tran, Dat
[J]. LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 70 - +
[8] Bimodal fusion in audio-visual speech recognition
Zhang, XZ
Mersereau, RM
Clements, M
[J]. 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
[9] A Deep Neural Network for Audio-Visual Person Recognition
Alam, Mohammad Rafiqul
Bennamoun, Mohammed
Togneri, Roberto
Sohel, Ferdous
[J]. 2015 IEEE 7TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS 2015), 2015,
[10] Multi-Feature Audio-Visual Person Recognition
Das, Amitav
Manyam, Ohil K.
Tapaswi, Makarand
[J]. 2008 IEEE WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2008, : 227 - 232

← 1 2 3 4 5 →