DATA SAMPLING ENSEMBLE ACOUSTIC MODELLING IN SPEAKER INDEPENDENT SPEECH RECOGNITION

被引:2
|
作者
Chen, Xin [1 ]
Zhao, Yunxin [1 ]
机构
[1] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA
关键词
ensemble acoustic modeling; recurrent neural network; speaker overlapped clustering; data sampling; speaker adaptation;
D O I
10.1109/ICASSP.2010.5495029
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we extend our recent data-sampling based ensemble acoustic modeling technique for the speaker-independent task of TIMIT and propose new methods to further improve the effectiveness of the ensemble acoustic models. We propose applying overlapped speaker clustering in data sampling to construct an ensemble of acoustic models for speaker independent speech recognition. In addition, we evaluate the method of data sampling in recurrent neural network for constructing a RNN based frame classifier. We also investigate using CVEM in place of EM in our ensemble acoustic model training. By using these methods on the speaker independent TIMIT phone recognition task, we have obtained a 2.5% absolute gain on phone accuracy over a standard HMM baseline system.
引用
收藏
页码:5130 / 5133
页数:4
相关论文
共 50 条
  • [31] Speaker independent speech recognition system based on phoneme identification
    Maheswari, N. Uma
    Kabilan, A. P.
    Venkatesh, R.
    [J]. ICCN: 2008 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING, 2008, : 585 - +
  • [32] Generalized Cyclic Transformations in Speaker-Independent Speech Recognition
    Mueller, Florian
    Belilovsky, Eugene
    Mertins, Alfred
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 211 - 215
  • [33] Speaker independent audio-visual continuous speech recognition
    Liang, LH
    Liu, XX
    Zhao, YB
    Pi, XB
    Nefian, AV
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A25 - A28
  • [34] Modeling long-range dependencies in speech data for text-independent speaker recognition
    Ming, Ji
    Lin, Jie
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4825 - +
  • [35] An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition
    Tsao, Yu
    Lee, Chin-Hui
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05): : 1025 - 1037
  • [36] Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
    Itoh, Arata
    Hara, Sunao
    Kitaoka, Norihide
    Takeda, Kazuya
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10): : 2479 - 2485
  • [37] SPEAKER-ENSEMBLE HIDDEN MARKOV MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Ye, Guoli
    Mak, Brian
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 6 - 10
  • [38] Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data
    Kostoulas, Theodoros
    Ganchev, Todor
    Fakotakis, Nikos
    [J]. VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 235 - 242
  • [39] Acoustic modelling for Croatian speech recognition and synthesis
    Martincic-Ipsic, Sanda
    Ribaric, Slobodan
    Ipsic, Ivo
    [J]. INFORMATICA, 2008, 19 (02) : 227 - 254
  • [40] SPEAKER-INDEPENDENT WORD RECOGNITION IN CONNECTED SPEECH ON THE BASIS OF PHONEME RECOGNITION
    MAENOBU, K
    ARIKI, Y
    SAKAI, T
    [J]. INFORMATION SCIENCES, 1984, 33 (1-2) : 31 - 61