Speech recognition with dynamic Bayesian networks

被引:0
|
作者
Zweig, G [1 ]
Russell, S [1 ]
机构
[1] Univ Calif Berkeley, Dept Comp Sci, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dynamic Bayesian networks (DBNs) are a useful tool for representing complex stochastic processes. Recent developments in inference and learning in DBNs allow their use in real-world applications. In this paper, we apply DBNs to the problem of speech recognition. The factored state representation enabled by DBNs allows us to explicitly represent long-term articulatory and acoustic context in addition to the phonetic-state information maintained by hidden Markov models (HMMs). Furthermore, it enables us to model the short-term correlations among multiple observation streams within single time-frames. Given a DBN structure capable of representing these long- and short-term correlations, we applied the EM algorithm to learn models with up to 500,000 parameters. The use of structured DBN models decreased the error rate by 12 to 29% on a large-vocabulary isolated-word recognition task, compared to a discrete HMM; it also improved significantly on other published results for the same task. This is the first successful application of DBNs to a large-scale speech recognition problem. Investigation of the learned models indicates that the hidden state variables are strongly correlated with acoustic properties of the speech signal.
引用
收藏
页码:173 / 180
页数:8
相关论文
共 50 条
  • [41] Speech/music discriminator of radio recordings based on dynamic programming and Bayesian networks
    Pikrakis, Aggelos
    Giannakopoulos, Theodoros
    Theodoridis, Sergios
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (05) : 846 - 857
  • [42] Exploiting Visual Features using Bayesian Gated Neural Networks for Disordered Speech Recognition
    Liu, Shansong
    Hu, Shoukang
    Wang, Yi
    Yu, Jianwei
    Su, Rongfeng
    Liu, Xunying
    Meng, Helen
    [J]. INTERSPEECH 2019, 2019, : 4120 - 4124
  • [43] Bayesian Recognition Procedures on Networks
    Vagis, A. A.
    [J]. JOURNAL OF AUTOMATION AND INFORMATION SCIENCES, 2010, 42 (11) : 58 - 63
  • [44] Bayesian networks for speech and image integration
    Wachsmuth, S
    Sagerer, G
    [J]. EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 300 - 306
  • [45] NETWORKS FOR SPEECH ENHANCEMENT AND AUTOMATIC SPEECH RECOGNITION
    Vu, Thanh T.
    Bigot, Benjamin
    Chng, Eng Siong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 499 - 503
  • [46] Intention Recognition for Partial-Order Plans Using Dynamic Bayesian Networks
    Krauthausen, Peter
    Hanebeck, Uwe D.
    [J]. FUSION: 2009 12TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2009, : 444 - 451
  • [47] BAYESIAN TRANSFORMER LANGUAGE MODELS FOR SPEECH RECOGNITION
    Xue, Boyang
    Yu, Jianwei
    Xu, Junhao
    Liu, Shansong
    Hu, Shoukang
    Ye, Zi
    Geng, Mengzhe
    Liu, Xunying
    Meng, Helen
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7378 - 7382
  • [48] Dynamical networks for speech recognition
    Byrne, W.
    Shamma, S.
    [J]. Neural Networks, 1988, 1 (1 SUPPL)
  • [49] Variational Bayesian estimation and clustering for speech recognition
    Watanabe, S
    Minami, Y
    Nakamura, A
    Ueda, N
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 365 - 381
  • [50] A BAYESIAN CLASSIFICATION APPROACH WITH APPLICATION TO SPEECH RECOGNITION
    MERHAV, N
    EPHRAIM, Y
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (10) : 2157 - 2166