Improved data modeling for text-dependent speaker recognition using sub-band processing

被引:9
|
作者
Finan R.A. [1 ]
Damper R.I. [2 ]
Sapeluk A.T. [1 ]
机构
[1] School of Engineering, University of Abertay Dundee
[2] Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Hants
关键词
Bias-variance dilemma; Fletcher-Allen principle; Information fusion; Linear prediction; Speaker recognition;
D O I
10.1023/A:1009652732313
中图分类号
学科分类号
摘要
A growing body of recent work documents the potential benefits of sub-band processing over wideband processing in automatic speech recognition and, less usually, speaker recognition. It is often found that the sub-band approach delivers performance improvements (especially in the presence of noise), but not always so. This raises the question of precisely when and how sub-band processing might be advantageous, which is difficult to answer because there is as yet only a rudimentary theoretical framework guiding this work. We describe a simple sub-band speaker recognition system designed to facilitate experimentation aimed at increasing understanding of the approach. This splits the time-domain speech signal into 16 sub-bands using a bank of second-order filters spaced on the psychophysical mel scale. Each sub-band has its own separate cepstral-based recognition system, the outputs of which are combined using the sum rule to produce a final decision. We find that sub-band processing leads to worthwhile reductions in both the verification and identification error rates relative to the wideband system. decreasing the identification error rate from 3.33% to 0.56% and equal error rate for verification by approximately 50% for clean speech. The hypothesis is advanced that, unlike the wideband system, sub-band processing effectively constrains the free parameters of the speaker models to be more uniformly deployed across frequency: as such, it offers a practical solution to the bias/variance dilemma of data modeling. Much remains to be done to explore fully the new paradigm of sub-band processing. Accordingly, several avenues for future are identified. In particular, we aim to explore the hypothesis of a practical solution to the bias/variance dilemma in more depth.
引用
收藏
页码:45 / 62
页数:17
相关论文
共 50 条
  • [31] TEXT-DEPENDENT SPEAKER RECOGNITION WITH LONG-TERM FEATURES BASED ON FUNCTIONAL DATA ANALYSIS
    Zhang, Chenhao
    Zheng, Thomas Fang
    Chen, Ruxin
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 340 - 344
  • [32] Recognition of Cough Using Features Improved by Sub-band Energy Transformation
    Zhu, Chunmei
    Tian, Lianfang
    Li, Xiangyang
    Mo, Hongqiang
    Zheng, Zeguang
    PROCEEDINGS OF THE 2013 6TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2013), VOLS 1 AND 2, 2013, : 251 - 255
  • [33] Text-dependent and text-independent speaker recognition of reverberant speech based on CNN
    El-Moneim, Samia Abd
    Sedik, Ahmed
    Nassar, M. A.
    El-Fishawy, Adel S.
    Sharshar, A. M.
    Hassan, Shaimaa E. A.
    Mahmoud, Adel Zaghloul
    Dessouky, Moawd I.
    El-Banby, Ghada M.
    El-Samie, Fathi E. Abd
    El-Rabaie, El-Sayed M.
    Neyazi, Badawi
    Seddeq, H. S.
    Ismail, Nabil A.
    Khalaf, Ashraf A. M.
    Elabyad, G. S. M.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 993 - 1006
  • [34] Text-dependent Speaker Recognition System Based on Speaking Frequency Characteristics
    Van, Khoa N.
    Minh, Tri P.
    Son, Thang N.
    Ly, Minh H.
    Dang, Tin T.
    Anh Dinh
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2018, 2018, 11251 : 214 - 227
  • [35] Text-dependent and text-independent speaker recognition of reverberant speech based on CNN
    Samia Abd El-Moneim
    Ahmed Sedik
    M. A. Nassar
    Adel S. El-Fishawy
    A. M. Sharshar
    Shaimaa E. A. Hassan
    Adel Zaghloul Mahmoud
    Moawd I. Dessouky
    Ghada M. El-Banby
    Fathi E. Abd El-Samie
    El-Sayed M. El-Rabaie
    Badawi Neyazi
    H. S. Seddeq
    Nabil A. Ismail
    Ashraf A. M. Khalaf
    G. S. M. Elabyad
    International Journal of Speech Technology, 2021, 24 : 993 - 1006
  • [36] Usefulness of Text-Conditioning and A New Database for Text-Dependent Speaker Recognition Research
    Das, Amitava
    Chittaranjan, Gokul
    Anumanchipalli, Gopala K.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1925 - +
  • [37] Text-dependent speaker identification using fisher differentiation vector
    Li, B
    Liu, WJ
    Zhong, QH
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 309 - 314
  • [38] Text-dependent speaker-recognition using one-pass dynamic programming algorithm
    Ramasubramanian, V.
    Das, Amitava
    Kumar, V. Praveen
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 901 - 904
  • [39] COMPARISON OF MULTIPLE FEATURES AND MODELING METHODS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Liu, Yi
    He, Liang
    Tian, Yao
    Chen, Zhuzi
    Liu, Jia
    Johnson, Michael T.
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 629 - 636
  • [40] A dynamic-threshold approach to text-dependent speaker recognition using principles of Immune System
    Dey, Subhomoy
    Kashyap, Kishore
    2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,