Hierarchical multi-stream posterior based speech recognition system

被引:0
|
作者
Ketabdar, H [1 ]
Bourlard, H
Bengio, S
机构
[1] IDIAP Res Inst, Martigny, Switzerland
[2] EPFL, Lausanne, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into account acoustic context (e.g., as available in the whole utterance), as well as possible prior information (such as topological constraints). These posteriors are estimated based on "state gamma posterior" definition (typically used in standard HMMs training) extended to the case of multi-stream HMMs. This approach provides a new, principled, theoretical framework for hierarchical estimation/use of posteriors, multi-stream feature combination, and integrating appropriate context and prior knowledge in posterior estimates. In the present work, we used the resulting gamma posteriors as features for a standard HMM/CMM layer. On the OGI Digits database and on a reduced vocabulary version (1000 words) of the DARPA Conversational Telephone Speech-to-text (CTS) task, this resulted in significant performance improvement, compared to the state-of-the-art Tandem systems.
引用
收藏
页码:294 / 306
页数:13
相关论文
共 50 条
  • [31] MULTI-STREAM TEMPORALLY VARYING WEIGHT REGRESSION FOR CROSS-LINGUAL SPEECH RECOGNITION
    Liu, Shilin
    Sim, Khe Chai
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 434 - 439
  • [32] Multi-Stream Convolution-Recurrent Neural Networks Based on Attention Mechanism Fusion for Speech Emotion Recognition
    Tao, Huawei
    Geng, Lei
    Shan, Shuai
    Mai, Jingchao
    Fu, Hongliang
    [J]. ENTROPY, 2022, 24 (08)
  • [33] Improved Decision Trees for Multi-stream HMM-based Audio-Visual Continuous Speech Recognition
    Huang, Jing
    Visweswariah, Karthik
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 228 - +
  • [34] Multi-stream fusion network for continuous gesture recognition based on sEMG
    Li, Jun
    Zou, Chunlong
    Tang, Dalai
    Sun, Ying
    Fan, Hanwen
    Li, Boao
    Tang, Xinjie
    [J]. International Journal of Wireless and Mobile Computing, 2024, 26 (04): : 374 - 383
  • [35] A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMS
    Tamura, S
    Iwano, K
    Furui, S
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 857 - 860
  • [36] Motion saliency based multi-stream multiplier ResNets for action recognition
    Zong, Ming
    Wang, Ruili
    Chen, Xiubo
    Chen, Zhe
    Gong, Yuanhao
    [J]. IMAGE AND VISION COMPUTING, 2021, 107 (107)
  • [37] Skeleton Feature Fusion Based on Multi-Stream LSTM for Action Recognition
    Wang, Lei
    Zhao, Xu
    Liu, Yuncai
    [J]. IEEE ACCESS, 2018, 6 : 50788 - 50800
  • [38] Fusion of multi-stream speech features for dialect classification
    Shweta Sinha
    Aruna Jain
    S. S. Agrawal
    [J]. CSI Transactions on ICT, 2015, 2 (4) : 243 - 252
  • [39] DBN-based multi-stream models for Mandarin toneme recognition
    Lei, X
    Ji, G
    Ng, T
    Bilmes, J
    Ostendorf, M
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 349 - 352
  • [40] Fused HMM-Adaptation of Multi-Stream HMMs for Audio-Visual Speech Recognition
    Dean, David
    Lucey, Patrick
    Sridharan, Sridha
    Wark, Tim
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2272 - 2275