Hierarchical multi-stream posterior based speech recognition system

被引:0
|
作者
Ketabdar, H [1 ]
Bourlard, H
Bengio, S
机构
[1] IDIAP Res Inst, Martigny, Switzerland
[2] EPFL, Lausanne, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into account acoustic context (e.g., as available in the whole utterance), as well as possible prior information (such as topological constraints). These posteriors are estimated based on "state gamma posterior" definition (typically used in standard HMMs training) extended to the case of multi-stream HMMs. This approach provides a new, principled, theoretical framework for hierarchical estimation/use of posteriors, multi-stream feature combination, and integrating appropriate context and prior knowledge in posterior estimates. In the present work, we used the resulting gamma posteriors as features for a standard HMM/CMM layer. On the OGI Digits database and on a reduced vocabulary version (1000 words) of the DARPA Conversational Telephone Speech-to-text (CTS) task, this resulted in significant performance improvement, compared to the state-of-the-art Tandem systems.
引用
收藏
页码:294 / 306
页数:13
相关论文
共 50 条
  • [1] Stream fusion for multi-stream automatic speech recognition
    Sagha, Hesam
    Li, Feipeng
    Variani, Ehsan
    Millan, Jose del R.
    Chavarriaga, Ricardo
    Schuller, Bjoern
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 669 - 675
  • [2] ON DNN POSTERIOR PROBABILITY COMBINATION IN MULTI-STREAM SPEECH RECOGNITION FOR REVERBERANT ENVIRONMENTS
    Xiong, Feifei
    Goetze, Stefan
    Meyer, Bernd T.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5250 - 5254
  • [3] Multi-stream HMM for EMG-based speech recognition
    Manabe, H
    Zhang, Z
    [J]. PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4389 - 4392
  • [4] Multi-stream parameterization for structural speech recognition
    Asakawa, Satoshi
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4097 - +
  • [5] A multi-stream bimodal continuous speech recognition system using datasieve based features
    Xie, L
    Ravyse, I
    Jiang, DM
    Zhao, RC
    Sahli, H
    Verhelst, W
    Cornelis, J
    [J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2287 - 2290
  • [6] Autoencoder based multi-stream combination for noise robust speech recognition
    Mallidi, Sri Harish
    Ogawa, Tetsuji
    Vesely, Karel
    Nidadavolu, Phani S.
    Hermansky, Hynek
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3551 - 3555
  • [7] SUBBAND HYBRID FEATURE FOR MULTI-STREAM SPEECH RECOGNITION
    Li, Feipeng
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [8] A multi-stream approach to audiovisual automatic speech recognition
    Hasegawa-Johnson, Mark
    [J]. 2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 328 - 331
  • [9] Multi-Stream End-to-End Speech Recognition
    Li, Ruizhi
    Wang, Xiaofei
    Mallidi, Sri Harish
    Watanabe, Shinji
    Hori, Takaaki
    Hermansky, Hynek
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655
  • [10] Multi-stream Recognition of Noisy Speech with Performance Monitoring
    Variani, Ehsan
    Li, Feipeng
    Hermansky, Hynek
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2977 - 2980