Effective Lip Localization and Tracking for Achieving Multimodal Speech Recognition

被引:0
|
作者
Ooi, Wei Chuan [1 ]
Jeon, Changwon [1 ]
Kim, Kihyeon [1 ]
Han, David K. [1 ]
Ko, Hanseok [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track tip shapes. The model requires only six degrees of freedom, yet provides a precise description of tip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.
引用
收藏
页码:649 / 652
页数:4
相关论文
共 50 条
  • [21] Towards the explainability of Multimodal Speech Emotion Recognition
    Kumar, Puneet
    Kaushik, Vishesh
    Raman, Balasubramanian
    INTERSPEECH 2021, 2021, : 1748 - 1752
  • [22] Temporal Multimodal Learning in Audiovisual Speech Recognition
    Hu, Di
    Li, Xuelong
    Lu, Xiaoqiang
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3574 - 3582
  • [23] CONTINUOUS VISUAL SPEECH RECOGNITION FOR MULTIMODAL FUSION
    Benhaim, Eric
    Sahbi, Hichem
    Vitte, Guillaume
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [24] Multimodal speech recognition for unmanned aerial vehicles
    Oneata, Dan
    Cucu, Horia
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 90
  • [25] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [26] Improved Lip Contour Extraction For Visual Speech Recognition
    Chalamala, Srinivasa Rao
    Gudla, Balakrishna
    Yegnanarayana, B.
    Sheela, Anitha K.
    2015 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2015, : 459 - 462
  • [27] Lip location normalized training for visual speech recognition
    Vanegas, Oscar
    Tokuda, Keiichi
    Kitamura, Tadashi
    IEICE Transactions on Information and Systems, 2000, 383 -D (11) : 1969 - 1977
  • [28] Lip location normalized training for visual speech recognition
    Vanegas, O
    Tokuda, K
    Kitamura, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2000, E83D (11): : 1969 - 1977
  • [29] Visual Lip Contour Detection for the Purpose of Speech Recognition
    Dalka, Piotr
    Bratoszewski, Piotr
    Czyzewski, Andrzej
    2014 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS (ICSES), 2014,
  • [30] Speech recognition in adverse environments using lip information
    Thambiratnam, D
    Wark, T
    Sridharan, S
    Chandran, V
    IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 149 - 152