Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech

被引:4
|
作者
Das, Biswajit [1 ]
Mandal, Sandipan [1 ]
Mitra, Pabitra [1 ]
Basu, Anupam [1 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Kharagpur 721302, W Bengal, India
关键词
Aging speech recognition; Vocal tract length normalization (VTLN); Maximum likelihood linear transform (MLLT); Maximum likelihood linear regression (MLLR); Maximum a posteriori (MAP); Maximum mutual information estimation (MMIE); VOCAL-TRACT; EXPECTATION MAXIMIZATION; NORMALIZATION; AGE;
D O I
10.1016/j.patrec.2012.10.029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The article describes the speech recognition system development in Bengali language for aging population with various adaptation techniques. Variability in acoustic characteristics among different speakers degrades speech recognition accuracy. In general, perceptual as well as acoustical variations exists among speakers, but variations are more pronounced between young and aged population. Deviation in voice source features between two age groups, affect the speech recognition performance. Existing automatic speech recognition algorithms demands large amount of training data with all variability to develop a robust speech recognition system. However, speaker normalization and adaptation techniques attempts to reduce inter-speaker or intra-speaker acoustic variability without having large amount of training data. Here, conventional acoustic model adaptation method e.g. vocal tract length normalization, maximum likelihood linear regression and/or maximum a posteriori are combined in the current study to improve recognition accuracy. Moreover, maximum mutual information estimation technique has been implemented in this study. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:335 / 343
页数:9
相关论文
共 50 条
  • [31] DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION
    Zhao, Yong
    Li, Jinyu
    Zhang, Shixiong
    Chen, Liping
    Gong, Yifan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5984 - 5988
  • [32] A Survey on Bengali Speech-to-Text Recognition Techniques
    Sultana, Rumia
    Palit, Ratesh
    2014 9TH INTERNATIONAL FORUM ON STRATEGIC TECHNOLOGY (IFOST), 2014, : 26 - 29
  • [34] Speaker clustering and transformation for speaker adaptation in speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 71 - 77
  • [35] Continuous Speech Recognition and Identification of the Speaker System
    Guffanti, Diego
    Martinez, Danilo
    Paladines, Jose
    Sarmiento, Andrea
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY & SYSTEMS (ICITS 2018), 2018, 721 : 767 - 776
  • [36] Vietnamese Large Vocabulary Continuous Speech Recognition
    Ngoc Thang Vu
    Schultz, Tanja
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 333 - 338
  • [37] Advances in large vocabulary continuous speech recognition
    Zweig, G
    Picheny, M
    ADVANCES IN COMPUTERS, VOL. 60: INFORMATION SECURITY, 2004, 60 : 249 - 291
  • [38] MODELS OF CONTINUOUS SPEECH RECOGNITION AND THE CONTENTS OF THE VOCABULARY
    MCQUEEN, JM
    CUTLER, A
    BRISCOE, T
    NORRIS, D
    LANGUAGE AND COGNITIVE PROCESSES, 1995, 10 (3-4): : 309 - 331
  • [39] Towards speech rate independence in large vocabulary continuous speech recognition
    Martinez, F
    Tapias, D
    Alvarez, J
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 725 - 728
  • [40] Speech fragment decoding techniques for simultaneous speaker identification and speech recognition
    Barker, Jon
    Ma, Ning
    Coy, Andre
    Cooke, Martin
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 94 - 111