Processing of linear prediction residual in spectral and cepstral domains for speaker information

被引:5
|
作者
Pati D. [1 ]
Prasanna S.R.M. [1 ]
机构
[1] Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati
来源
Int J Speech Technol | / 3卷 / 333-350期
关键词
LP residual; M-PDSS; R-MFCC; R-MSE; Source information; Speaker recognition; Spectral and cepstral domains;
D O I
10.1007/s10772-015-9273-9
中图分类号
学科分类号
摘要
In this work the linear prediction (LP) residual is processed in spectral and cepstral domains to model the speaker-specific excitation information. In the spectral domain, the excitation energy information is modeled from subband energies (SBE). The excitation periodicity information is modeled by power differences of spectrum in subband (PDSS) measure. This work carries some refinements in the existing methods of extracting SBE and PDSS by exploiting the nature of the excitation spectrum. The SBE and PDSS values are computed from mel warped residual subband spectrum and called as residual mel subband energies (R-MSE) and mel power differences of subband spectra (M-PDSS), respectively. The different speaker recognition studies performed using NIST-99 and NIST-03 databases demonstrate that R-MSE and M-PDSS features represent good speaker information. It is also demonstrated that the excitation energy information can be better modeled in the cepstral domain by residual mel frequency cepstral coefficients (R-MFCC). Furhter, the evidences provided by M-PDSS and R-MFCC features are different and combine well and provides improved recognition performance. The combined evidence from M-PDSS and R-MFCC together with the vocal tract information further improves the performance. Finally, a comparative study on processing the LP residual in temporal, spectral and cepstral domains demonstrates that with a small compromise with the recognition performance, processing LP residual in spectral and cepstral domains provide compact and effective way of representing the excitation information, as compared to temporal processing. © 2015, Springer Science+Business Media New York.
引用
收藏
页码:333 / 350
页数:17
相关论文
共 50 条
  • [41] FURTHER OPTIMISATIONS OF CONSTANT Q CEPSTRAL PROCESSING FOR INTEGRATED UTTERANCE AND TEXT-DEPENDENT SPEAKER VERIFICATION
    Delgado, Hector
    Todisco, Massimiliano
    Sahidullah, Md
    Sarkar, Achintya K.
    Evans, Nicholas
    Kinnunen, Tomi
    Tan, Zheng-Hua
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 179 - 185
  • [42] Selection of the Best Set of Shifted Delta Cepstral Features in Speaker Verification Using Mutual Information.
    Calvo, Jose R.
    Fernandez, Rafael
    Hernandez, Gabriel
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2335 - 2338
  • [43] Fusion of Static and Transitional Information of Cepstral and Spectral Features for Music Genre Classification
    Lee, Chang-Hsing
    Shih, Jau-Ling
    Yu, Kun-Ming
    Lin, Hwai-San
    Wei, Ming-Hui
    2008 IEEE ASIA-PACIFIC SERVICES COMPUTING CONFERENCE, VOLS 1-3, PROCEEDINGS, 2008, : 751 - 756
  • [44] FAST MODELLING OF PINNA SPECTRAL NOTCHES FROM HRTFS USING LINEAR PREDICTION RESIDUAL CEPSTRUM
    Ahuja, Chaitanya
    Hegde, Rajesh M.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [45] SPEAKER RECOGNITION AND VERIFICATION USING LINEAR PREDICTION ANALYSIS
    SAMBUR, MR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1973, 53 (01): : 354 - &
  • [46] Formant Based Linear Prediction Coefficients for Speaker Identification
    Srivastava, Sumit
    Nandi, Pratibha
    Sahoo, G.
    Chandra, Mahesh
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 685 - 688
  • [47] INCIDENTAL PROCESSING OF SPEAKER CHARACTERISTICS - VOICE AS CONNOTATIVE INFORMATION
    GEISELMAN, RE
    CRAWLEY, JM
    JOURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR, 1983, 22 (01): : 15 - 23
  • [48] Automatic identification of music performer using the linear prediction cepstral coefficients method
    Chudy, Magdalena
    ARCHIVES OF ACOUSTICS, 2008, 33 (01) : 27 - 33
  • [49] PULSED RESIDUAL EXCITED LINEAR PREDICTION
    KONDOZ, AM
    HOROS, J
    EVANS, BG
    SUDDLE, MR
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1995, 142 (02): : 105 - 110
  • [50] Replay Attack Detection in Automatic Speaker Verification Based on ResNeWt18 with Linear Frequency Cepstral Coefficients
    Chaiwongyen, Anuwat
    Pinkeaw, Kanokkarn
    Kongprawechnon, Waree
    Karnjana, Jessada
    Unoki, Masashi
    16TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2021), 2021,