Recognition of visual speech elements using adaptively boosted hidden Markov models

被引:29
|
作者
Foo, SW [1 ]
Lian, Y
Dong, L
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119260, Singapore
[3] Natl Univ Singapore, Dept Elect & Comp Engn, Digital Syst & Applicat Lab, Singapore 117548, Singapore
关键词
adaptive boosting (AdaBoost); automatic lip reading; hidden Markov model (HMM); visual speech processing;
D O I
10.1109/TCSVT.2004.826773
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The performance of automatic speech recognition (ASR) system can be significantly enhanced with additional information from visual speech elements such as the movement of lips, tongue, and teeth, especially under noisy environment. In this paper, a novel approach for recognition of visual speech elements is presented. The approach makes use of adaptive boosting (AdaBoost) and hidden Markov models (HMMs) to build an AdaBoost-HMM classifier. The composite HMMs of the AdaBoost-HMM classifier are trained to cover different groups of training samples using the AdaBoost technique and the biased Baum-Welch training method. By combining the decisions of the component classifiers of the composite HMMs according to a novel probability synthesis rule, a more complex decision boundary is formulated than using the single HMM classifier. The method is applied to the recognition of the basic visual speech elements. Experimental results show that the AdaBoost-HMM classifier outperforms the traditional HMM classifier in accuracy, especially for visemes extracted from contexts.
引用
收藏
页码:693 / 705
页数:13
相关论文
共 50 条
  • [41] Stressed speech recognition using multi-dimensional Hidden Markov Models
    Womack, BD
    Hansen, JHL
    [J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 404 - 411
  • [42] AUTOMATIC SPEECH RECOGNITION USING TIED DENSITY HIDDEN MARKOV-MODELS
    EULER, S
    [J]. FREQUENZ, 1992, 46 (11-12) : 274 - 279
  • [43] A Speech Recognition IC Using Hidden Markov Models with Continuous Observation Densities
    Wei Han
    Kwok-Wai Hon
    Cheong-Fat Chan
    Chiu-Sing Choy
    Kong-Pang Pun
    [J]. The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 2007, 47 : 223 - 232
  • [44] Whispered Speech Recognition using Hidden Markov Models and Support Vector Machines
    Galic, Jovan
    Popovic, Branislav
    Pavlovic, Dragana Sumarac
    [J]. ACTA POLYTECHNICA HUNGARICA, 2018, 15 (05) : 11 - 29
  • [45] Recognition of vowel segments in Spanish esophageal speech using Hidden Markov Models
    Mantilla, Alfredo
    Perez-Meana, Hector
    Mata, Daniel
    Angeles, Carlos
    Alvarado, Jorge
    Cabrera, Laura
    [J]. CIC 2006: 15TH INTERNATIONAL CONFERENCE ON COMPUTING, PROCEEDINGS, 2006, : 115 - +
  • [46] Speaker-independent embedded speech recognition using Hidden Markov Models
    Marufo da Silva, Mariano
    Evin, Diego A.
    Verrastro, Sebastian
    [J]. IEEE CACIDI 2016 - IEEE CONFERENCE ON COMPUTER SCIENCES, 2016,
  • [47] A speech recognition IC using hidden markov models with continuous observation densities
    Han, Wei
    Hon, Kwok-Wai
    Chan, Cheong-Fat
    Choy, Chiu-Sing
    Pun, Kong-Pang
    [J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2007, 47 (03): : 223 - 232
  • [48] Hybrid Simulated Annealing and Its Application to Optimization of Hidden Markov Models for Visual Speech Recognition
    Lee, Jong-Seok
    Park, Cheol Hoon
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2010, 40 (04): : 1188 - 1196
  • [49] Characteristics of the use of coupled hidden Markov models for audio-visual Polish speech recognition
    Kubanek, M.
    Bobulski, J.
    Adrjanowicz, L.
    [J]. BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2012, 60 (02) : 307 - 316
  • [50] BAYESIAN LARGE MARGIN HIDDEN MARKOV MODELS FOR SPEECH RECOGNITION
    Chen, Jung-Chun
    Chien, Jen-Tzung
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3765 - 3768