Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

被引:1
|
作者
Islam, Md. Rabiul [1 ]
Sobhan, Md. Abdus [2 ]
机构
[1] Rajshahi Univ Engn & Technol, Dept Comp Sci & Engn, Rajshahi 6204, Bangladesh
[2] Independent Univ, Sch Engn & Comp Sci, Dhaka 1229, Bangladesh
关键词
D O I
10.1155/2014/831830
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI) system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM) is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs) and Linear Prediction Cepstral Coefficients (LPCCs) are combined to get the audio feature vectors and Active Shape Model (ASM) based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA) method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Semi-Coupled Hidden Markov Model with State-Based Alignment Strategy for Audio-Visual Emotion Recognition
    Lin, Jen-Chun
    Wu, Chung-Hsien
    Wei, Wen-Li
    [J]. AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PT I, 2011, 6974 : 185 - 194
  • [32] Audio-Visual Speech Recognition Using A Two-Step Feature Fusion Strategy
    Liu, Hong
    Xu, Wanlu
    Yang, Bing
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1896 - 1903
  • [33] A confidence-based late fusion framework for audio-visual biometric identification
    Alam, Mohammad Rafiqul
    Bennamoun, Mohammed
    Togneri, Roberto
    Sohel, Ferdous
    [J]. PATTERN RECOGNITION LETTERS, 2015, 52 : 65 - 71
  • [34] Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model
    Ahmad, Rehan
    Zubair, Syed
    Alquhayz, Hani
    Ditta, Allah
    [J]. SENSORS, 2019, 19 (23)
  • [35] A LIP GEOMETRY APPROACH FOR FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION
    Ibrahim, M. Z.
    Mulvaney, D. J.
    [J]. 2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 644 - 647
  • [36] Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models
    Feng, Wei
    Xie, Lei
    Zeng, Jia
    Liu, Zhi-Qiang
    [J]. JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2009, 20 (03): : 188 - 195
  • [37] A ResNet-Based Audio-Visual Fusion Model for Piano Skill Evaluation
    Zhao, Xujian
    Wang, Yixin
    Cai, Xuebo
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [38] Lip landmark-based audio-visual speech enhancement with multimodal feature fusion network
    Li, Yangke
    Zhang, Xinman
    [J]. NEUROCOMPUTING, 2023, 549
  • [39] Vehicle classification based on audio-visual feature fusion with low-quality images and noise
    Zhao, Yiming
    Zhao, Hongdong
    Zhang, Xuezhi
    Liu, Weina
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (05) : 8931 - 8944
  • [40] Omnidirectional Audio-Visual Talker Localizer With Dynamic Feature Fusion Based on Validity and Reliability Criteria
    Denda, Yuki
    Nishiura, Takanobu
    Yamashita, Yoichi
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2320 - +