Improved data modeling for text-dependent speaker recognition using sub-band processing

被引:9
|
作者
Finan R.A. [1 ]
Damper R.I. [2 ]
Sapeluk A.T. [1 ]
机构
[1] School of Engineering, University of Abertay Dundee
[2] Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Hants
关键词
Bias-variance dilemma; Fletcher-Allen principle; Information fusion; Linear prediction; Speaker recognition;
D O I
10.1023/A:1009652732313
中图分类号
学科分类号
摘要
A growing body of recent work documents the potential benefits of sub-band processing over wideband processing in automatic speech recognition and, less usually, speaker recognition. It is often found that the sub-band approach delivers performance improvements (especially in the presence of noise), but not always so. This raises the question of precisely when and how sub-band processing might be advantageous, which is difficult to answer because there is as yet only a rudimentary theoretical framework guiding this work. We describe a simple sub-band speaker recognition system designed to facilitate experimentation aimed at increasing understanding of the approach. This splits the time-domain speech signal into 16 sub-bands using a bank of second-order filters spaced on the psychophysical mel scale. Each sub-band has its own separate cepstral-based recognition system, the outputs of which are combined using the sum rule to produce a final decision. We find that sub-band processing leads to worthwhile reductions in both the verification and identification error rates relative to the wideband system. decreasing the identification error rate from 3.33% to 0.56% and equal error rate for verification by approximately 50% for clean speech. The hypothesis is advanced that, unlike the wideband system, sub-band processing effectively constrains the free parameters of the speaker models to be more uniformly deployed across frequency: as such, it offers a practical solution to the bias/variance dilemma of data modeling. Much remains to be done to explore fully the new paradigm of sub-band processing. Accordingly, several avenues for future are identified. In particular, we aim to explore the hypothesis of a practical solution to the bias/variance dilemma in more depth.
引用
收藏
页码:45 / 62
页数:17
相关论文
共 50 条
  • [1] Sub-band based text-dependent speaker verification
    Sivakumaran, P
    Ariyaeeinia, AM
    Loomes, MJ
    SPEECH COMMUNICATION, 2003, 41 (2-3) : 485 - 509
  • [2] Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition
    Li, Lantian
    Lin, Yiye
    Zhang, Zhiyong
    Wang, Dong
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 426 - 429
  • [3] Text-dependent speaker recognition using speaker specific compensation
    Laxman, S
    Sastry, PS
    IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 384 - 387
  • [4] Text-dependent Speaker Recognition for Vietnamese
    Diep Dao Thi Thu
    Quang Nguyen Hong
    Loan Trinh Van
    Hung Pham Ngoc
    2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 196 - 200
  • [5] Speaker and Channel Factors in Text-Dependent Speaker Recognition
    Stafylakis, Themos
    Kenny, Patrick
    Alam, Md. Jahangir
    Kockmann, Marcel
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) : 65 - 78
  • [6] Text-dependent speaker recognition using PLDA with uncertainty propagation
    Stafylakis, T.
    Kenny, P.
    Ouellet, P.
    Perez, J.
    Kockmann, M.
    Dumouchel, P.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3651 - 3655
  • [7] Text-dependent Speaker Recognition using Wavelets and Neural Networks
    Chee Peng Lim
    Siew Chan Woo
    Soft Computing, 2007, 11 : 549 - 556
  • [8] Text-dependent speaker recognition using wavelets and neural networks
    Lim, Chee Peng
    Woo, Siew Chan
    SOFT COMPUTING, 2007, 11 (06) : 549 - 556
  • [9] Improving Text-Dependent Speaker Recognition Performance
    Impedovo, Donato
    Refice, Mario
    TOOLS AND APPLICATIONS WITH ARTIFICIAL INTELLIGENCE, 2009, 166 : 199 - 211