Effective Lip Localization and Tracking for Achieving Multimodal Speech Recognition

被引:0
|
作者
Ooi, Wei Chuan [1 ]
Jeon, Changwon [1 ]
Kim, Kihyeon [1 ]
Han, David K. [1 ]
Ko, Hanseok [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track tip shapes. The model requires only six degrees of freedom, yet provides a precise description of tip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.
引用
收藏
页码:649 / 652
页数:4
相关论文
共 50 条
  • [31] Emotional Speech Recognition Based on Lip-Reading
    Ryumina, Elena
    Ivanko, Denis
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 616 - 625
  • [32] Lip-Based Visual Speech Recognition System
    Frisky, Aufaclav Zatu Kusuma
    Wang, Chien-Yao
    Santoso, Andri
    Wang, Jia-Ching
    49TH ANNUAL IEEE INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2015, : 315 - 319
  • [33] Lip reading for robust speech recognition on embedded devices
    Perez, JFG
    Frangi, AF
    Solano, EL
    Lukas, K
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 473 - 476
  • [34] Simultaneous Multimodal PC Access for People With Disabilities by Integrating Head Tracking, Speech Recognition, and Tongue Motion
    Sahadat, M. Nazmus
    Alreja, Arish
    Ghovanloo, Maysam
    IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, 2018, 12 (01) : 192 - 201
  • [35] An effective multimodal representation and fusion method for multimodal intent recognition
    Huang, Xuejian
    Ma, Tinghuai
    Jia, Li
    Zhang, Yuanjian
    Rong, Huan
    Alnabhan, Najla
    NEUROCOMPUTING, 2023, 548
  • [36] Lip Localization Technique Towards an Automatic Lip Reading Approach for Myanmar Consonants Recognition
    Thein, Thein
    San, Kalyar Myo
    CONFERENCE PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT), 2018, : 123 - 127
  • [37] Multimodal Data Fusion Architectures in Audiovisual Speech Recognition
    Sayed, Hadeer M.
    ElDeeb, Hesham E.
    Taiel, Shereen A.
    INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, WORLDCIST 2023, 2024, 800 : 655 - 667
  • [38] Real time face detection for multimodal speech recognition
    Murai, K
    Nakamura, S
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A373 - A376
  • [39] Multimodal emotion recognition based on speech and ECG signals
    Huang C.
    Jin Y.
    Wang Q.
    Zhao L.
    Zou C.
    Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2010, 40 (05): : 895 - 900
  • [40] EXPLOITING MULTIMODAL DATA FUSION IN ROBUST SPEECH RECOGNITION
    Heracleous, Panikos
    Badin, Pierre
    Bailly, Gerard
    Hagita, Norihiro
    2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 568 - 572