Effective Lip Localization and Tracking for Achieving Multimodal Speech Recognition

被引：0

作者：

Ooi, Wei Chuan ^{[1
]}

Jeon, Changwon ^{[1
]}

Kim, Kihyeon ^{[1
]}

Han, David K. ^{[1
]}

Ko, Hanseok ^{[1
]}

机构：

[1] Korea Univ, Sch Elect Engn, Seoul, South Korea

来源：

2008 IEEE INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS, VOLS 1 AND 2 | 2008年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track tip shapes. The model requires only six degrees of freedom, yet provides a precise description of tip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.

引用

页码：649 / 652

页数：4

共 50 条

[31] Emotional Speech Recognition Based on Lip-Reading
Ryumina, Elena
Ivanko, Denis
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 616 - 625
[32] Lip-Based Visual Speech Recognition System
Frisky, Aufaclav Zatu Kusuma
Wang, Chien-Yao
Santoso, Andri
Wang, Jia-Ching
49TH ANNUAL IEEE INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2015, : 315 - 319
[33] Lip reading for robust speech recognition on embedded devices
Perez, JFG
Frangi, AF
Solano, EL
Lukas, K
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 473 - 476
[34] Simultaneous Multimodal PC Access for People With Disabilities by Integrating Head Tracking, Speech Recognition, and Tongue Motion
Sahadat, M. Nazmus
Alreja, Arish
Ghovanloo, Maysam
IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, 2018, 12 (01) : 192 - 201
[35] An effective multimodal representation and fusion method for multimodal intent recognition
Huang, Xuejian
Ma, Tinghuai
Jia, Li
Zhang, Yuanjian
Rong, Huan
Alnabhan, Najla
NEUROCOMPUTING, 2023, 548
[36] Lip Localization Technique Towards an Automatic Lip Reading Approach for Myanmar Consonants Recognition
Thein, Thein
San, Kalyar Myo
CONFERENCE PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT), 2018, : 123 - 127
[37] Multimodal Data Fusion Architectures in Audiovisual Speech Recognition
Sayed, Hadeer M.
ElDeeb, Hesham E.
Taiel, Shereen A.
INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, WORLDCIST 2023, 2024, 800 : 655 - 667
[38] Real time face detection for multimodal speech recognition
Murai, K
Nakamura, S
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A373 - A376
[39] Multimodal emotion recognition based on speech and ECG signals
Huang C.
Jin Y.
Wang Q.
Zhao L.
Zou C.
Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2010, 40 (05): : 895 - 900
[40] EXPLOITING MULTIMODAL DATA FUSION IN ROBUST SPEECH RECOGNITION
Heracleous, Panikos
Badin, Pierre
Bailly, Gerard
Hagita, Norihiro
2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 568 - 572

← 1 2 3 4 5 →