A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System

被引:8
|
作者
Oura, Keiichiro [1 ]
Zen, Heiga [1 ,2 ]
Nankaku, Yoshihiko [1 ]
Lee, Akinobu [1 ,3 ]
Tokuda, Keiichi [1 ,4 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi, Japan
[2] IBM Corp, TJ Watson Res Ctr, Human Language Technol Grp, Yorktown Hts, NY USA
[3] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara Inst, Nara, Japan
[4] ATR, Spoken Language Commun Res Labs, Kyoto, Japan
来源
关键词
speech recognition; hidden Markov model; hidden semi-Markov model; weighted finite-state transducer;
D O I
10.1093/ietisy/e91-d.11.2693
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In a hidden Markov model (HMM), state duration probabilities decrease exponentially with time, which fails to adequately represent the temporal structure of speech. One of the solutions to this problem is integrating state duration probability distributions explicitly into the HMM. This form is known as a hidden semi-Markov model (HSMM). However, though a number of attempts to use HSMMs in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding. By avoiding these approximations using a generalized forward-back ward algorithm, a context-dependent duration modeling technique and weighted finite-state transducers (WFSTs), we construct a fully consistent HSMM-based speech recognition system. In a speaker-dependent continuous speech recognition experiment, our system achieved about 9.1 % relative error reduction over the corresponding HMM-based system.
引用
收藏
页码:2693 / 2700
页数:8
相关论文
共 50 条
  • [1] A hidden semi-Markov model-based speech synthesis system
    Zen, Heiga
    Tokuda, Keiichi
    Masuko, Takashi
    Kobayasih, Takao
    Kitamura, Tadashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05): : 825 - 834
  • [2] Joint Audiovisual Hidden Semi-Markov Model-Based Speech Synthesis
    Schabus, Dietmar
    Pucher, Michael
    Hofer, Gregor
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 336 - 347
  • [3] Hidden semi-Markov model based speech recognition system using weighted finite-state transducer
    Oura, Keiichiro
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 33 - 36
  • [4] AUTOREGRESSIVE VARIATIONAL AUTOENCODER WITH A HIDDEN SEMI-MARKOV MODEL-BASED STRUCTURED ATTENTION FOR SPEECH SYNTHESIS
    Fujimoto, Takato
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7462 - 7466
  • [5] Hidden Markov model-based speech emotion recognition
    Schuller, B
    Rigoll, G
    Lang, M
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 1 - 4
  • [6] Hidden Markov model-based speech emotion recognition
    Schuller, B
    Rigoll, G
    Lang, M
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 401 - 404
  • [7] A Bayesian Approach to Hidden Semi-Markov Model Based Speech Synthesis
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1719 - 1722
  • [8] Style Estimation of Speech Based on Multiple Regression Hidden Semi-Markov Model
    Nose, Takashi
    Kato, Yoichi
    Kobayashi, Takao
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2900 - 2903
  • [9] Machine condition recognition via hidden semi-Markov model
    Yang, Wenhui
    Chen, Lu
    COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 158 (158)
  • [10] STATE CLUSTERING IN HIDDEN MARKOV MODEL-BASED CONTINUOUS SPEECH RECOGNITION
    YOUNG, SJ
    WOODLAND, PC
    COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04): : 369 - 383