Processing of linear prediction residual in spectral and cepstral domains for speaker information

被引：5

作者：

Pati D. ^{[1
]}

Prasanna S.R.M. ^{[1
]}

机构：

[1] Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati

来源：

Int J Speech Technol | / 3卷 / 333-350期

关键词：

LP residual; M-PDSS; R-MFCC; R-MSE; Source information; Speaker recognition; Spectral and cepstral domains;

D O I：

10.1007/s10772-015-9273-9

中图分类号：

学科分类号：

摘要：

In this work the linear prediction (LP) residual is processed in spectral and cepstral domains to model the speaker-specific excitation information. In the spectral domain, the excitation energy information is modeled from subband energies (SBE). The excitation periodicity information is modeled by power differences of spectrum in subband (PDSS) measure. This work carries some refinements in the existing methods of extracting SBE and PDSS by exploiting the nature of the excitation spectrum. The SBE and PDSS values are computed from mel warped residual subband spectrum and called as residual mel subband energies (R-MSE) and mel power differences of subband spectra (M-PDSS), respectively. The different speaker recognition studies performed using NIST-99 and NIST-03 databases demonstrate that R-MSE and M-PDSS features represent good speaker information. It is also demonstrated that the excitation energy information can be better modeled in the cepstral domain by residual mel frequency cepstral coefficients (R-MFCC). Furhter, the evidences provided by M-PDSS and R-MFCC features are different and combine well and provides improved recognition performance. The combined evidence from M-PDSS and R-MFCC together with the vocal tract information further improves the performance. Finally, a comparative study on processing the LP residual in temporal, spectral and cepstral domains demonstrates that with a small compromise with the recognition performance, processing LP residual in spectral and cepstral domains provide compact and effective way of representing the excitation information, as compared to temporal processing. © 2015, Springer Science+Business Media New York.

引用

页码：333 / 350

页数：17

共 50 条

[11] Linear prediction residual features for automatic speaker verification anti-spoofing
Cemal Hanilçi
Multimedia Tools and Applications, 2018, 77 : 16099 - 16111
[12] Linear Prediction Residual-Based Constant-Q Cepstral Coefficients for Replay Attack Detection
Phapatanaburi, Khomdet
Buayai, Prawit
Kupimai, Mongkol
Yodrot, Teerapon
2020 8TH INTERNATIONAL ELECTRICAL ENGINEERING CONGRESS (IEECON), 2020,
[13] Processing linear prediction residual signal to counter replay attacks
Mishra, Jagabandhu
Singh, Madhusudan
Pati, Debadatta
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 95 - 99
[14] Implicit processing of linear prediction residual for replay attack detection
Veesa, Suresh
Singh, Madhusudan
International Journal of Speech Technology, 2024, 27 (03) : 781 - 791
[15] Speaker Verification Anti-Spoofing Using Linear Prediction Residual Phase Features
Hanilci, Cemal
2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 96 - 100
[16] Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition
Hora, Baveet Singh
Uthiraa, S.
Patil, Hemant A.
SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 116 - 129
[17] Feature detection based on linear prediction residual for Spoofing countermeasures of speaker verification system
Chen, Min
Yu, Yibiao
FIFTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2020, 11526
[18] Restoring the Residual Speaker Information in Total Variability Modeling for Speaker Verification
Zhang, Ce
Zheng, Rong
Xu, Bo
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 132 - 135
[19] Residual Information in Deep Speaker Embedding Architectures
Stan, Adriana
MATHEMATICS, 2022, 10 (21)
[20] Usefulness of residual-based features in speaker verification and their combination way with linear prediction coefficients
Hsu, Wei-Chih
Lai, Wen-Hsing
Hong, Wei-Ping
ISM WORKSHOPS 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA - WORKSHOPS, PROCEEDINGS, 2007, : 246 - 251

← 1 2 3 4 5 →