Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN

被引:10
|
作者
Nainan, Sumita [1 ]
Kulkarni, Vaishali [1 ]
机构
[1] SVKMs NMIMS Deemed Univ, Mumbai, Maharashtra, India
关键词
ASR; 1-D CNN; SVM; GMM; Fisher score; ROBUST; IDENTIFICATION; VERIFICATION; FUSION; NOISE;
D O I
10.1007/s10772-020-09771-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Contemporary automatic speaker recognition (ASR) systems do not provide 100% accuracy making it imperative to explore different techniques to improve it. Easy access to mobile devices and advances in sensor technology, has made voice a preferred parameter for biometrics. Here, a comparative analysis of accuracies obtained in ASR with employment of classical Gaussian mixture model (GMM), support vector machine (SVM) which is the machine learning algorithm and the state of art 1-D CNN as classifiers is presented. Authors propose considering dynamic voice features along with static features as relevant speaker information in them lead to substantial improvement in the accuracy for ASR. As concatenation of features leads to the redundancy and increased computation complexity, Fisher score algorithm was employed to select the best contributing features resulting in improvement in accuracy. The results indicate that SVM and 1-D Neural network outperform GMM. Support Vector Machine (SVM), and 1-D CNN gave comparable results with 1-D CNN giving an improved accuracy of 94.77% in ASR.
引用
收藏
页码:809 / 822
页数:14
相关论文
共 50 条
  • [31] An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques
    Mohammed Jawad Al-Dujaili Al-Khazraji
    Abbas Ebrahimi-Moghadam
    Wireless Personal Communications, 2024, 134 : 735 - 753
  • [32] An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques
    Al-Khazraji, Mohammed Jawad Al-Dujaili
    Ebrahimi-Moghadam, Abbas
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 134 (02) : 735 - 753
  • [33] A novel concatenated 1D-CNN model for speech emotion recognition
    Flower, T. Mary Little
    Jaya, T.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 93
  • [34] Performance Comparison Of Speaker Recognition Systems Using GMM and i-Vector Methods with PNCC and RASTA PLP Features
    Nayana, P. K.
    Mathew, Dominic
    Thomas, Abraham
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 438 - 443
  • [35] Deep Learning for Speaker Recognition: A Comparative Analysis of 1D-CNN and LSTM Models Using Diverse Datasets
    Hassanzadeh, Hiwa
    Qadir, Jihad Anwar
    Omer, Saman Muhammad
    Ahmed, Mohammed Hussein
    Khezri, Edris
    4TH INTERDISCIPLINARY CONFERENCE ON ELECTRICS AND COMPUTER, INTCEC 2024, 2024,
  • [36] Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
    Geng, Mengzhe
    Xie, Xurong
    Ye, Zi
    Wang, Tianzi
    Li, Guinan
    Hu, Shujie
    Liu, Xunying
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2597 - 2611
  • [37] Recognizing Emotionally Coloured Dialogue Speech Using Speaker-Adapted DNN-CNN Bottleneck Features
    Mukaihara, Kohei
    Sakti, Sakriani
    Nakamura, Satoshi
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 632 - 641
  • [38] Cosine Visibility Extension of 1-D Mirrored Aperture Synthesis by a CNN for Spatial Resolution Enhancement
    Zhao, Guanghui
    Li, Qingxia
    Lei, Zhenyu
    Xiao, Chengwang
    Chen, Zhiwei
    Huang, Yuhang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [39] Cosine Visibility Extension of 1-D Mirrored Aperture Synthesis by a CNN for Spatial Resolution Enhancement
    Zhao, Guanghui
    Li, Qingxia
    Lei, Zhenyu
    Xiao, Chengwang
    Chen, Zhiwei
    Huang, Yuhang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [40] Robust regression fusion of GMM-UBM and GMM-SVM normalized scores using G729 bit-stream for speaker recognition over IP
    Yessad, Dalila
    Amrouche, Abderrahmane
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2014, 17 (01) : 43 - 51