Robust audiovisual integration using semicontinuous hidden Markov models

被引:0
|
作者
Su, Q
Silsbee, PL
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe an improved method of integrating audio and visual information in a HMM-based audiovisual ASR system. The method uses a modified semicontinuous HMM (SCHMM) for integration and recognition. Our results show substantial improvements over earlier integration methods at high noise levels. Our integration method relies on the assumption that, as environmental conditions deviate from those under which training occurred, the underlying probability distributions will also change. We use phoneme based SCHMMs for classification of isolated words. The probability models underlying the standard SCHMM are Gaussian; thus, low probability estimates will tend to be associated with high confidences (small differences in the feature values cause large proportional differences in probabilities, when the values are in the tail of the distribution). Therefore, during classification, we replace each Gaussian with a scoring function which looks Gaussian mar the mean of the distribution but has a heavier tail. We report results comparing this method with an audio-only system and with previous integration methods. At high noise levels, the system with modified scoring functions shows a better than 50recognition does suffer when noise is low. Methods which can adjust the relative weight of the audio and visual information can still potentially outperform the new method, provided that a reliable way of choosing those weights can be found.
引用
收藏
页码:42 / 45
页数:4
相关论文
共 50 条
  • [1] PHONEME CLASSIFICATION USING SEMICONTINUOUS HIDDEN MARKOV-MODELS
    HUANG, XD
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1992, 40 (05) : 1062 - 1067
  • [2] Comparative study of discrete, semicontinuous, and continuous hidden Markov models
    Huang, X.D.
    Hon, H.W.
    Hwang, M.Y.
    Lee, K.F.
    [J]. Computer Speech and Language, 1993, 7 (04): : 359 - 368
  • [3] Robust Face Recognition Using Subface Hidden Markov Models
    Huang, Shih-Ming
    Yang, Jar-Ferr
    Chang, Shih-Cheng
    [J]. 2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, 2010, : 1547 - 1550
  • [4] ROBUST CLASSIFICATION USING HIDDEN MARKOV MODELS AND MIXTURES OF NORMALIZING FLOWS
    Ghosh, Anubhab
    Honore, Antoine
    Liu, Dong
    Henter, Gustav Eje
    Chatterjee, Saikat
    [J]. PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
  • [5] PERFORMANCE COMPARISON BETWEEN SEMICONTINUOUS AND DISCRETE HIDDEN MARKOV-MODELS OF SPEECH
    HUANG, XD
    JACK, MA
    [J]. ELECTRONICS LETTERS, 1988, 24 (03) : 149 - 150
  • [6] Audiovisual-to-articulatory speech inversion using active appearance models for the face and Hidden Markov Models for the dynamics
    Katsamanis, Athanassios
    Papandreou, George
    Maragos, Petros
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2237 - 2240
  • [7] Robust filtering and propagation of uncertainty in hidden Markov models
    Allan, Andrew L.
    [J]. ELECTRONIC JOURNAL OF PROBABILITY, 2021, 26
  • [8] Robust parametric modeling of durations in hidden Markov models
    Burshtein, D
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (03): : 240 - 242
  • [9] Basecalling using hidden Markov models
    Boufounos, P
    El-Difrawy, S
    Ehrlich, D
    [J]. JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2004, 341 (1-2): : 23 - 36
  • [10] HEALTHCARE AUDIO EVENT CLASSIFICATION USING HIDDEN MARKOV MODELS AND HIERARCHICAL HIDDEN MARKOV MODELS
    Peng, Ya-Ti
    Lin, Ching-Yung
    Sun, Ming-Ting
    Tsai, Kun-Cheng
    [J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1218 - +