Improved data modeling for text-dependent speaker recognition using sub-band processing

被引：9

作者：

Finan R.A. ^{[1
]}

Damper R.I. ^{[2
]}

Sapeluk A.T. ^{[1
]}

机构：

[1] School of Engineering, University of Abertay Dundee

[2] Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Hants

来源：

International Journal of Speech Technology | 2001年 / 4卷 / 1期

关键词：

Bias-variance dilemma; Fletcher-Allen principle; Information fusion; Linear prediction; Speaker recognition;

D O I：

10.1023/A:1009652732313

中图分类号：

学科分类号：

摘要：

A growing body of recent work documents the potential benefits of sub-band processing over wideband processing in automatic speech recognition and, less usually, speaker recognition. It is often found that the sub-band approach delivers performance improvements (especially in the presence of noise), but not always so. This raises the question of precisely when and how sub-band processing might be advantageous, which is difficult to answer because there is as yet only a rudimentary theoretical framework guiding this work. We describe a simple sub-band speaker recognition system designed to facilitate experimentation aimed at increasing understanding of the approach. This splits the time-domain speech signal into 16 sub-bands using a bank of second-order filters spaced on the psychophysical mel scale. Each sub-band has its own separate cepstral-based recognition system, the outputs of which are combined using the sum rule to produce a final decision. We find that sub-band processing leads to worthwhile reductions in both the verification and identification error rates relative to the wideband system. decreasing the identification error rate from 3.33% to 0.56% and equal error rate for verification by approximately 50% for clean speech. The hypothesis is advanced that, unlike the wideband system, sub-band processing effectively constrains the free parameters of the speaker models to be more uniformly deployed across frequency: as such, it offers a practical solution to the bias/variance dilemma of data modeling. Much remains to be done to explore fully the new paradigm of sub-band processing. Accordingly, several avenues for future are identified. In particular, we aim to explore the hypothesis of a practical solution to the bias/variance dilemma in more depth.

引用

页码：45 / 62

页数：17

共 50 条

[31] TEXT-DEPENDENT SPEAKER RECOGNITION WITH LONG-TERM FEATURES BASED ON FUNCTIONAL DATA ANALYSIS
Zhang, Chenhao
Zheng, Thomas Fang
Chen, Ruxin
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 340 - 344
[32] Recognition of Cough Using Features Improved by Sub-band Energy Transformation
Zhu, Chunmei
Tian, Lianfang
Li, Xiangyang
Mo, Hongqiang
Zheng, Zeguang
PROCEEDINGS OF THE 2013 6TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2013), VOLS 1 AND 2, 2013, : 251 - 255
[33] Text-dependent and text-independent speaker recognition of reverberant speech based on CNN
El-Moneim, Samia Abd
Sedik, Ahmed
Nassar, M. A.
El-Fishawy, Adel S.
Sharshar, A. M.
Hassan, Shaimaa E. A.
Mahmoud, Adel Zaghloul
Dessouky, Moawd I.
El-Banby, Ghada M.
El-Samie, Fathi E. Abd
El-Rabaie, El-Sayed M.
Neyazi, Badawi
Seddeq, H. S.
Ismail, Nabil A.
Khalaf, Ashraf A. M.
Elabyad, G. S. M.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 993 - 1006
[34] Text-dependent Speaker Recognition System Based on Speaking Frequency Characteristics
Van, Khoa N.
Minh, Tri P.
Son, Thang N.
Ly, Minh H.
Dang, Tin T.
Anh Dinh
FUTURE DATA AND SECURITY ENGINEERING, FDSE 2018, 2018, 11251 : 214 - 227
[35] Text-dependent and text-independent speaker recognition of reverberant speech based on CNN
Samia Abd El-Moneim
Ahmed Sedik
M. A. Nassar
Adel S. El-Fishawy
A. M. Sharshar
Shaimaa E. A. Hassan
Adel Zaghloul Mahmoud
Moawd I. Dessouky
Ghada M. El-Banby
Fathi E. Abd El-Samie
El-Sayed M. El-Rabaie
Badawi Neyazi
H. S. Seddeq
Nabil A. Ismail
Ashraf A. M. Khalaf
G. S. M. Elabyad
International Journal of Speech Technology, 2021, 24 : 993 - 1006
[36] Usefulness of Text-Conditioning and A New Database for Text-Dependent Speaker Recognition Research
Das, Amitava
Chittaranjan, Gokul
Anumanchipalli, Gopala K.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1925 - +
[37] Text-dependent speaker identification using fisher differentiation vector
Li, B
Liu, WJ
Zhong, QH
2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 309 - 314
[38] Text-dependent speaker-recognition using one-pass dynamic programming algorithm
Ramasubramanian, V.
Das, Amitava
Kumar, V. Praveen
2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 901 - 904
[39] COMPARISON OF MULTIPLE FEATURES AND MODELING METHODS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Liu, Yi
He, Liang
Tian, Yao
Chen, Zhuzi
Liu, Jia
Johnson, Michael T.
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 629 - 636
[40] A dynamic-threshold approach to text-dependent speaker recognition using principles of Immune System
Dey, Subhomoy
Kashyap, Kishore
2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,

← 1 2 3 4 5 →