Effective Lip Localization and Tracking for Achieving Multimodal Speech Recognition

被引:0
|
作者
Ooi, Wei Chuan [1 ]
Jeon, Changwon [1 ]
Kim, Kihyeon [1 ]
Han, David K. [1 ]
Ko, Hanseok [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track tip shapes. The model requires only six degrees of freedom, yet provides a precise description of tip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.
引用
收藏
页码:649 / 652
页数:4
相关论文
共 50 条
  • [41] Dual Memory Fusion for Multimodal Speech Emotion Recognition
    Priyasad, Darshana
    Fernando, Tharindu
    Sridharan, Sridha
    Denman, Simon
    Fookes, Clinton
    INTERSPEECH 2023, 2023, : 4543 - 4547
  • [42] LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES
    Sun, Felix
    Harwath, David
    Glass, James
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 573 - 578
  • [43] Improving Recognition of Speech System Using Multimodal Approach
    Radha, N.
    Shahina, A.
    Khan, A. Nayeemulla
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 397 - 410
  • [44] Fine-Grained Grounding for Multimodal Speech Recognition
    Srinivasan, Tejas
    Sanabria, Ramon
    Metze, Florian
    Elliott, Desmond
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2667 - 2677
  • [45] Speech recognition technology combined with three dimensional lip movement
    Dept. of Info. Network Engineering, Kanagawa Institute of Technology, 1030 Shimo-ogino, Atsugi-shi, Kanagawa 243-0292, Japan
    Proc SPIE Int Soc Opt Eng, 1600, (95-102):
  • [46] Speech recognition technology combined with three dimensional lip movement
    Komiya, K
    Isikawa, R
    Momose, K
    THREE-DIMENSIONAL IMAGE CAPTURE AND APPLICATIONS IV, 2001, 4298 : 95 - 102
  • [47] Multimodal transformer augmented fusion for speech emotion recognition
    Wang, Yuanyuan
    Gu, Yu
    Yin, Yifei
    Han, Yingping
    Zhang, He
    Wang, Shuang
    Li, Chenyu
    Quan, Dou
    FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [48] Speech Emotion Recognition Adapted to Multimodal Semantic Repositories
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Kotsakis, Rigas
    Dimoulas, Charalampos
    2018 13TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION AND PERSONALIZATION (SMAP 2018), 2018, : 31 - 35
  • [49] Bayesian networks in multimodal speech recognition and speaker identification
    Nefian, AV
    Liang, LH
    CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 2004 - 2008
  • [50] Deep Multimodal Emotion Recognition on Human Speech: A Review
    Koromilas, Panagiotis
    Giannakopoulos, Theodoros
    APPLIED SCIENCES-BASEL, 2021, 11 (17):