Improved data modeling for text-dependent speaker recognition using sub-band processing

被引：9

作者：

Finan R.A. ^{[1
]}

Damper R.I. ^{[2
]}

Sapeluk A.T. ^{[1
]}

机构：

[1] School of Engineering, University of Abertay Dundee

[2] Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Hants

来源：

International Journal of Speech Technology | 2001年 / 4卷 / 1期

关键词：

Bias-variance dilemma; Fletcher-Allen principle; Information fusion; Linear prediction; Speaker recognition;

D O I：

10.1023/A:1009652732313

中图分类号：

学科分类号：

摘要：

A growing body of recent work documents the potential benefits of sub-band processing over wideband processing in automatic speech recognition and, less usually, speaker recognition. It is often found that the sub-band approach delivers performance improvements (especially in the presence of noise), but not always so. This raises the question of precisely when and how sub-band processing might be advantageous, which is difficult to answer because there is as yet only a rudimentary theoretical framework guiding this work. We describe a simple sub-band speaker recognition system designed to facilitate experimentation aimed at increasing understanding of the approach. This splits the time-domain speech signal into 16 sub-bands using a bank of second-order filters spaced on the psychophysical mel scale. Each sub-band has its own separate cepstral-based recognition system, the outputs of which are combined using the sum rule to produce a final decision. We find that sub-band processing leads to worthwhile reductions in both the verification and identification error rates relative to the wideband system. decreasing the identification error rate from 3.33% to 0.56% and equal error rate for verification by approximately 50% for clean speech. The hypothesis is advanced that, unlike the wideband system, sub-band processing effectively constrains the free parameters of the speaker models to be more uniformly deployed across frequency: as such, it offers a practical solution to the bias/variance dilemma of data modeling. Much remains to be done to explore fully the new paradigm of sub-band processing. Accordingly, several avenues for future are identified. In particular, we aim to explore the hypothesis of a practical solution to the bias/variance dilemma in more depth.

引用

页码：45 / 62

页数：17

共 50 条

[1] Sub-band based text-dependent speaker verification
Sivakumaran, P
Ariyaeeinia, AM
Loomes, MJ
SPEECH COMMUNICATION, 2003, 41 (2-3) : 485 - 509
[2] Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition
Li, Lantian
Lin, Yiye
Zhang, Zhiyong
Wang, Dong
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 426 - 429
[3] Text-dependent speaker recognition using speaker specific compensation
Laxman, S
Sastry, PS
IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 384 - 387
[4] Text-dependent Speaker Recognition for Vietnamese
Diep Dao Thi Thu
Quang Nguyen Hong
Loan Trinh Van
Hung Pham Ngoc
2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 196 - 200
[5] Speaker and Channel Factors in Text-Dependent Speaker Recognition
Stafylakis, Themos
Kenny, Patrick
Alam, Md. Jahangir
Kockmann, Marcel
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) : 65 - 78
[6] Text-dependent speaker recognition using PLDA with uncertainty propagation
Stafylakis, T.
Kenny, P.
Ouellet, P.
Perez, J.
Kockmann, M.
Dumouchel, P.
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3651 - 3655
[7] Text-dependent Speaker Recognition using Wavelets and Neural Networks
Chee Peng Lim
Siew Chan Woo
Soft Computing, 2007, 11 : 549 - 556
[8] Text-dependent speaker recognition using wavelets and neural networks
Lim, Chee Peng
Woo, Siew Chan
SOFT COMPUTING, 2007, 11 (06) : 549 - 556
[9] Improving Text-Dependent Speaker Recognition Performance
Impedovo, Donato
Refice, Mario
TOOLS AND APPLICATIONS WITH ARTIFICIAL INTELLIGENCE, 2009, 166 : 199 - 211
[10] An educational text-dependent speaker recognition system
Ibrahim, Dogan
INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING EDUCATION, 2012, 49 (01) : 61 - 73

← 1 2 3 4 5 →