Speech recognition with dynamic Bayesian networks

被引:0
|
作者
Zweig, G [1 ]
Russell, S [1 ]
机构
[1] Univ Calif Berkeley, Dept Comp Sci, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dynamic Bayesian networks (DBNs) are a useful tool for representing complex stochastic processes. Recent developments in inference and learning in DBNs allow their use in real-world applications. In this paper, we apply DBNs to the problem of speech recognition. The factored state representation enabled by DBNs allows us to explicitly represent long-term articulatory and acoustic context in addition to the phonetic-state information maintained by hidden Markov models (HMMs). Furthermore, it enables us to model the short-term correlations among multiple observation streams within single time-frames. Given a DBN structure capable of representing these long- and short-term correlations, we applied the EM algorithm to learn models with up to 500,000 parameters. The use of structured DBN models decreased the error rate by 12 to 29% on a large-vocabulary isolated-word recognition task, compared to a discrete HMM; it also improved significantly on other published results for the same task. This is the first successful application of DBNs to a large-scale speech recognition problem. Investigation of the learned models indicates that the hidden state variables are strongly correlated with acoustic properties of the speech signal.
引用
收藏
页码:173 / 180
页数:8
相关论文
共 50 条
  • [1] Dynamic Bayesian networks for automatic speech recognition
    Deviren, M
    [J]. EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 981 - 981
  • [2] Speech recognition using dynamic programming of Bayesian neural networks
    Huang, CC
    Wang, JF
    Wu, CH
    Lee, JY
    [J]. CENTRAL AUDITORY PROCESSING AND NEURAL MODELING, 1998, : 71 - 76
  • [3] Dynamic Bayesian networks for audio-visual speech recognition
    Nefian, AV
    Liang, LH
    Pi, XB
    Liu, XX
    Murphy, K
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1274 - 1288
  • [4] Dynamic Bayesian Networks for Audio-Visual Speech Recognition
    Ara V. Nefian
    Luhong Liang
    Xiaobo Pi
    Xiaoxing Liu
    Kevin Murphy
    [J]. EURASIP Journal on Advances in Signal Processing, 2002
  • [5] Dynamic Bayesian networks for multi-band automatic speech recognition
    Daoudi, K
    Fohr, D
    Antoine, C
    [J]. COMPUTER SPEECH AND LANGUAGE, 2003, 17 (2-3): : 263 - 285
  • [6] Switching auxiliary chains for speech recognition based on dynamic Bayesian networks
    Lin, Hui
    Ou, Zhijian
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 258 - +
  • [7] Continuous speech recognition using dynamic Bayesian networks: A fast decoding algorithm
    Deviren, M
    Daoudi, K
    [J]. ADVANCES IN BAYESIAN NETWORKS, 2004, 146 : 289 - 308
  • [8] DYNAMIC NEURAL NETWORKS FOR SPEECH RECOGNITION
    OLAFSSON, S
    [J]. BT TECHNOLOGY JOURNAL, 1992, 10 (03): : 48 - 58
  • [9] Dynamic Bayesian networks for visual recognition of dynamic gestures
    Avilés-Arriaga, HH
    Sucar, LE
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2002, 12 (3-4) : 243 - 250
  • [10] Dynamic Bayesian network inversion for robust speech recognition
    Xie, Lei
    Yang, Hongwu
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (07): : 1117 - 1120