Channel Robust MFCCs for Continuous Speech Speaker Recognition

被引:3
|
作者
Chougule, Sharada Vikram [1 ]
Chavan, Mahesh S. [2 ]
机构
[1] Finolex Acad Management & Technol, Dept Elect & Telecommun Engn, Ratnagiri, Maharashtra, India
[2] KITs Coll Engn, Dept Elect Engn, Kolhapur, Maharashtra, India
关键词
Text independent speaker recognition; MFCC; magnitude spectral subtraction; cepstral mean normalization;
D O I
10.1007/978-3-319-04960-1_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the years, MFCC (Mel Frequency Cepstral Coefficients), has been used as a standard acoustic feature set for speech and speaker recognition. The models derived from these features gives optimum performance in terms of recognition of speakers for the same training and testing conditions. But mismatch between training and testing conditions and type of channel used for creating speaker model, drastically drops the performance of speaker recognition system. In this experimental research, the performance of MFCCs for closed-set text independent speaker recognition is studied under different training and testing conditions. Magnitude spectral subtraction is used to estimate magnitude spectrum of clean speech from additive noise magnitude. The mel-warped cepstral coefficients are then normalized by taking their mean, referred as cepstral mean normalization used to reduce the effect of convolution noise created due to change in channel between training and testing. The performance of this modified MFCCs, have been tested using Multi-speaker continuous (Hindi) speech database (By Department of Information Technology, Government of India). Use of improved MFCC as compared to conventional MFCC perk up the speaker recognition performance drastically.
引用
收藏
页码:557 / 568
页数:12
相关论文
共 50 条
  • [1] Scale-invariant MFCCs for speech/speaker recognition
    Tufekci, Zekeriya
    Disken, Gokay
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3758 - 3762
  • [2] Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
    Taherian, Hassan
    Wang, Zhong-Qiu
    Chang, Jorge
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1293 - 1302
  • [3] MULTILEVEL SPEECH INTELLIGIBILITY FOR ROBUST SPEAKER RECOGNITION
    Nemala, Sridhar Krishna
    Elhilali, Mounya
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4393 - 4396
  • [4] Speaker and Noise Factorization for Robust Speech Recognition
    Wang, Yongqiang
    Gales, Mark J. F.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (07): : 2149 - 2158
  • [5] Continuous Speech Recognition and Identification of the Speaker System
    Guffanti, Diego
    Martinez, Danilo
    Paladines, Jose
    Sarmiento, Andrea
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY & SYSTEMS (ICITS 2018), 2018, 721 : 767 - 776
  • [6] Speaker Recognition for noisy speech in telephonic channel
    Maurya, Ankur
    Aggarwal, R. K.
    [J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2016, : 451 - 456
  • [7] Environmental robust speech and speaker recognition through multi-channel histogram equalization
    Squartini, Stefano
    Principi, Emanuele
    Rotili, Rudy
    Piazza, Francesco
    [J]. NEUROCOMPUTING, 2012, 78 (01) : 111 - 120
  • [8] Speaker Independent Automatic Emotion Recognition from Speech: A Comparison of MFCCs and Discrete Wavelet Transforms
    Shah, Firoz A.
    Krishnan, Vimal V. R.
    Sukumar, Raji A.
    Jayakumar, Athulya
    Anto, Babu P.
    [J]. 2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 528 - 531
  • [9] What Else is New Than the Hamming Window? Robust MFCCs for Speaker Recognition via Multitapering
    Kinnunen, Tomi
    Saeidi, Rahim
    Sandberg, Johan
    Hansson-Sandsten, Maria
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2742 - +
  • [10] Speaker recognition via fusion of subglottal features and MFCCs
    Arsikere, Harish
    Gupta, Hitesh Anand
    Alwan, Abeer
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1106 - 1110