Effective Lip Localization and Tracking for Achieving Multimodal Speech Recognition

被引:0
|
作者
Ooi, Wei Chuan [1 ]
Jeon, Changwon [1 ]
Kim, Kihyeon [1 ]
Han, David K. [1 ]
Ko, Hanseok [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track tip shapes. The model requires only six degrees of freedom, yet provides a precise description of tip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.
引用
收藏
页码:649 / 652
页数:4
相关论文
共 50 条
  • [1] Multimodal speaker/speech recognition using lip motion, lip texture and audio
    Cetingul, H. E.
    Erzin, E.
    Yemez, Y.
    Tekalp, A. M.
    SIGNAL PROCESSING, 2006, 86 (12) : 3549 - 3558
  • [2] Real-time lip tracking and bimodal continuous speech recognition
    Chan, MT
    Zhang, Y
    Huang, TS
    1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 65 - 70
  • [3] Realtime lip contour tracking for audio-visual speech recognition applications
    Yazdi, Mehran
    Seyfi, Mehdi
    Rafati, Amirhossein
    Asadi, Meghdad
    World Academy of Science, Engineering and Technology, 2009, 40 : 164 - 167
  • [4] A robust hierarchical lip tracking approach for lipreading and audio visual speech recognition
    Xie, L
    Cai, XL
    Fu, ZH
    Zhao, RC
    Jiang, DM
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 3620 - 3624
  • [5] Lip Tracking Using Particle Filter and Geometric Model for Visual Speech Recognition
    Jarraya, Islem
    Werda, Salah
    Mahdi, Walid
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS (SIGMAP), 2014, : 172 - 179
  • [6] Lip Tracking Method for the System of Audio-Visual Polish Speech Recognition
    Kubanek, Mariusz
    Bobulski, Janusz
    Adrjanowicz, Lukasz
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2012, 7267 : 535 - 542
  • [7] A hybrid approach for automatic lip localization and viseme classification to enhance visual speech recognition
    Mahdi, Walid
    Werda, Salah
    Ben Hamadou, Abdelmajid
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2008, 15 (03) : 253 - 266
  • [8] A hybrid approach for automatic lip localization and viseme classification to enhance visual speech recognition
    Multimedia Information Systems and Advanced Computing Laboratory, High Institute of Computer Science and Multimedia, University of Sfax, Sfax, Tunisia
    Integr. Comput. Aided Eng., 2008, 3 (253-266):
  • [9] Multimodal systems for speech recognition
    Mamyrbayev, Orken Zh
    Alimhan, Keylan
    Amirgaliyev, Beibut
    Zhumazhanov, Bagashar
    Mussayeva, Dinara
    Gusmanova, Farida
    INTERNATIONAL JOURNAL OF MOBILE COMMUNICATIONS, 2020, 18 (03) : 314 - 326
  • [10] Multimodal recognition of speech and electrocorticogram
    Ahuja, Mitali
    Komeiji, Shuji
    Mitsuhashi, Takumi
    Iimura, Yasushi
    Suzuki, Hiroharu
    Sugano, Hidenori
    Shinoda, Koichi
    Tanaka, Toshihisa
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 546 - 550