Effective Lip Localization and Tracking for Achieving Multimodal Speech Recognition

被引：0

作者：

Ooi, Wei Chuan ^{[1
]}

Jeon, Changwon ^{[1
]}

Kim, Kihyeon ^{[1
]}

Han, David K. ^{[1
]}

Ko, Hanseok ^{[1
]}

机构：

[1] Korea Univ, Sch Elect Engn, Seoul, South Korea

来源：

2008 IEEE INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS, VOLS 1 AND 2 | 2008年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track tip shapes. The model requires only six degrees of freedom, yet provides a precise description of tip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.

引用

页码：649 / 652

页数：4

共 50 条

[21] Towards the explainability of Multimodal Speech Emotion Recognition
Kumar, Puneet
Kaushik, Vishesh
Raman, Balasubramanian
INTERSPEECH 2021, 2021, : 1748 - 1752
[22] Temporal Multimodal Learning in Audiovisual Speech Recognition
Hu, Di
Li, Xuelong
Lu, Xiaoqiang
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3574 - 3582
[23] CONTINUOUS VISUAL SPEECH RECOGNITION FOR MULTIMODAL FUSION
Benhaim, Eric
Sahbi, Hichem
Vitte, Guillaume
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[24] Multimodal speech recognition for unmanned aerial vehicles
Oneata, Dan
Cucu, Horia
COMPUTERS & ELECTRICAL ENGINEERING, 2021, 90
[25] END-TO-END MULTIMODAL SPEECH RECOGNITION
Palaskar, Shruti
Sanabria, Ramon
Metze, Florian
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
[26] Improved Lip Contour Extraction For Visual Speech Recognition
Chalamala, Srinivasa Rao
Gudla, Balakrishna
Yegnanarayana, B.
Sheela, Anitha K.
2015 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2015, : 459 - 462
[27] Lip location normalized training for visual speech recognition
Vanegas, Oscar
Tokuda, Keiichi
Kitamura, Tadashi
IEICE Transactions on Information and Systems, 2000, 383 -D (11) : 1969 - 1977
[28] Lip location normalized training for visual speech recognition
Vanegas, O
Tokuda, K
Kitamura, T
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2000, E83D (11): : 1969 - 1977
[29] Visual Lip Contour Detection for the Purpose of Speech Recognition
Dalka, Piotr
Bratoszewski, Piotr
Czyzewski, Andrzej
2014 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS (ICSES), 2014,
[30] Speech recognition in adverse environments using lip information
Thambiratnam, D
Wark, T
Sridharan, S
Chandran, V
IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 149 - 152

← 1 2 3 4 5 →