HMM-based audio-visual speech recognition integrating geometric- and appearance-based visual features

被引:19
|
作者
Chan, MT [1 ]
机构
[1] Rockwell Sci Co, Thousand Oaks, CA 91360 USA
关键词
D O I
10.1109/MMSP.2001.962703
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A good front end for visual feature extraction is an important element of audio-visual speech recognition systems. We propose a new visual feature representation that combines both geometric- and pixel-based features. Using our previously developed contour-based lip-tracking algorithm, geometric features including the height and width of the lips are automatically extracted. Lip boundary tracking allows accurate determination of a region of interest from which we construct pixel-based features that are robust to variation in scale and translation. Motivated by computational considerations, we selected a subset of the pixels in the center of the inner mouth area that was found to capture sufficient details of the appearance of the teeth and tongue for assisting in the discrimination of spoken words. We show the advantage of the combination of these visual features for visual-only and audio-visual speech recognition of isolated digits.
引用
收藏
页码:9 / 14
页数:6
相关论文
共 50 条
  • [1] Combined Discriminative Training for Multi-Stream HMM-based Audio-Visual Speech Recognition
    Huang, Jing
    Visweswariah, Karthik
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1399 - +
  • [2] DEMONSTRATION OF AN HMM-BASED PHOTOREALISTIC EXPRESSIVE AUDIO-VISUAL SPEECH SYNTHESIS SYSTEM
    Filntisis, Panagiotis Paraskevas
    Katsamanis, Athanasios
    Maragos, Petros
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 4588 - 4588
  • [3] A coupled HMM for audio-visual speech recognition
    Nefian, AV
    Liang, LH
    Pi, XB
    Xiaoxiang, L
    Mao, C
    Murphy, K
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2013 - 2016
  • [4] Improved Decision Trees for Multi-stream HMM-based Audio-Visual Continuous Speech Recognition
    Huang, Jing
    Visweswariah, Karthik
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 228 - +
  • [5] SYNCHRONIZATION RULES FOR HMM-BASED AUDIO-VISUAL LAUGHTER SYNTHESIS
    Cakmak, Hueseyin
    Urbain, Jerome
    Dutoit, Thierry
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2304 - 2308
  • [6] Analysis of lip geometric features for audio-visual speech recognition
    Kaynak, MN
    Zhi, Q
    Cheok, AD
    Sengupta, K
    Han, Z
    Chung, KC
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2004, 34 (04): : 564 - 570
  • [7] Depth-based Features in Audio-Visual Speech Recognition
    Palecek, Karel
    Chaloupka, Josef
    [J]. 2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 303 - 306
  • [8] Rapid feature space speaker adaptation for multi-stream HMM-based audio-visual speech recognition
    Huang, J
    Marcheret, E
    Visweswariah, K
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 338 - 341
  • [9] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [10] Using Twin-HMM-Based Audio-Visual Speech Enhancement as a Front-End for Robust Audio-Visual Speech Recognition
    Abdelaziz, Ahmed Hussen
    Zeiler, Steffen
    Kolossa, Dorothea
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 867 - 871