Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN

被引:10
|
作者
Nainan, Sumita [1 ]
Kulkarni, Vaishali [1 ]
机构
[1] SVKMs NMIMS Deemed Univ, Mumbai, Maharashtra, India
关键词
ASR; 1-D CNN; SVM; GMM; Fisher score; ROBUST; IDENTIFICATION; VERIFICATION; FUSION; NOISE;
D O I
10.1007/s10772-020-09771-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Contemporary automatic speaker recognition (ASR) systems do not provide 100% accuracy making it imperative to explore different techniques to improve it. Easy access to mobile devices and advances in sensor technology, has made voice a preferred parameter for biometrics. Here, a comparative analysis of accuracies obtained in ASR with employment of classical Gaussian mixture model (GMM), support vector machine (SVM) which is the machine learning algorithm and the state of art 1-D CNN as classifiers is presented. Authors propose considering dynamic voice features along with static features as relevant speaker information in them lead to substantial improvement in the accuracy for ASR. As concatenation of features leads to the redundancy and increased computation complexity, Fisher score algorithm was employed to select the best contributing features resulting in improvement in accuracy. The results indicate that SVM and 1-D Neural network outperform GMM. Support Vector Machine (SVM), and 1-D CNN gave comparable results with 1-D CNN giving an improved accuracy of 94.77% in ASR.
引用
收藏
页码:809 / 822
页数:14
相关论文
共 50 条
  • [21] SVM and HMM Modeling Techniques for Speech Recognition Using LPCC and MFCC Features
    Ananthi, S.
    Dhanalakshmi, P.
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 519 - 526
  • [22] Text-independent speaker recognition using LSTM-RNN and speech enhancement
    Abd El-Moneim, Samia
    Nassar, M. A.
    Dessouky, Moawad I.
    Ismail, Nabil A.
    El-Fishawy, Adel S.
    Abd El-Samie, Fathi E.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 24013 - 24028
  • [23] Text-independent speaker recognition using LSTM-RNN and speech enhancement
    Samia Abd El-Moneim
    M. A. Nassar
    Moawad I. Dessouky
    Nabil A. Ismail
    Adel S. El-Fishawy
    Fathi E. Abd El-Samie
    Multimedia Tools and Applications, 2020, 79 : 24013 - 24028
  • [24] SPEAKER-INDEPENDENT ISOLATED WORD RECOGNITION USING DYNAMIC FEATURES OF SPEECH SPECTRUM
    FURUI, S
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (01): : 52 - 59
  • [25] Speech emotion recognition using deep 1D & 2D CNN LSTM networks
    Zhao, Jianfeng
    Mao, Xia
    Chen, Lijiang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2019, 47 : 312 - 323
  • [26] Visibility Extension of 1-D Aperture Synthesis by a Residual CNN for Spatial Resolution Enhancement
    Zhao, Guanghui
    Li, Qingxia
    Chen, Zhiwei
    Lei, Zhenyu
    Xiao, Chengwang
    Huang, Yuhang
    REMOTE SENSING, 2023, 15 (04)
  • [27] Hand Gesture Recognition using PCA based Deep CNN Reduced Features and SVM classifier
    Sahoo, Jaya Prakash
    Ari, Samit
    Patra, Sarat Kumar
    2019 IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2019), 2019, : 221 - 224
  • [28] Tool Wear Classification in Chipboard Milling Processes Using 1-D CNN and LSTM Based on Sequential Features
    Kurek, Jaroslaw
    Swiderska, Elzbieta
    Szymanowski, Karol
    APPLIED SCIENCES-BASEL, 2024, 14 (11):
  • [29] The 3D Emotion Recognition Using SVM and HoG Features
    Savakar, Dayanand G.
    Hosur, Ravi
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2020, 20 (03)
  • [30] Speaker Recognition Using LPC, MFCC, ZCR Features with ANN and SVM Classifier for Large Input Database
    Chauhan, Neha
    Isshiki, Tsuyoshi
    Li, Dongju
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 130 - 133