Real time face detection for multimodal speech recognition

被引：0

作者：

Murai, K ^{[1
]}

Nakamura, S ^{[1
]}

机构：

[1] Fuji Xerox, Informat Media Lab, Kanagawa 2590157, Japan

来源：

IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS | 2002年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a real time system to detect the speaker's frontal face for multimodal speech recognition. It is widely acknowledged that automatic speech recognizers, as well as humans, can improve recognition performance by adding visual modality, i.e., the speaker's facial image to audio modality([1][2]). Visual modality also provides inaudible information, such as the speaker's facial orientation([3]), and the location of the mouth. To acquire this information, we have to localize the speaker's face in real time. Our system is a combination of skin color detection and spatial feature detection. The color-based detection is fast but depends on the skin and the background color, while the special feature detection requires more computation. We applied color-based pruning to reduce the search space for the spatial feature detection. By detecting the facial orientation, the proposed method functions as a "Face to Talk" switch in place of the "Push to Talk" switch. In our experiment, pruning based on color reduced 53-97% of the search space, and 98.9% of the frontal face was detected correctly by the subsequent spatial detector.

引用

页码：A373 / A376

页数：4

共 50 条

[41] Multimodal systems for speech recognition
Mamyrbayev, Orken Zh
Alimhan, Keylan
Amirgaliyev, Beibut
Zhumazhanov, Bagashar
Mussayeva, Dinara
Gusmanova, Farida
INTERNATIONAL JOURNAL OF MOBILE COMMUNICATIONS, 2020, 18 (03) : 314 - 326
[42] Multimodal recognition of speech and electrocorticogram
Ahuja, Mitali
Komeiji, Shuji
Mitsuhashi, Takumi
Iimura, Yasushi
Suzuki, Hiroharu
Sugano, Hidenori
Shinoda, Koichi
Tanaka, Toshihisa
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 546 - 550
[43] Integration of Face Detection and User Identification with Visual Speech Recognition
Sagheer, Alaa
Aly, Saleh
NEURAL INFORMATION PROCESSING, ICONIP 2012, PT V, 2012, 7667 : 479 - 487
[44] Real time noise-speech discrimination in time domain for speech recognition application
Mokhtar, N.
Arof, H.
Adikan, F. R. Mahamd
Mubin, M.
SCIENTIFIC RESEARCH AND ESSAYS, 2011, 6 (01): : 18 - 22
[45] A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face
Lian, Hailun
Lu, Cheng
Li, Sunan
Zhao, Yan
Tang, Chuangao
Zong, Yuan
ENTROPY, 2023, 25 (10)
[46] Real-time face recognition using eigenfaces
Cendrillon, R
Lovell, BC
VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2000, PTS 1-3, 2000, 4067 : 269 - 276
[47] Hardware Solution For Real-time Face Recognition
Mahale, Gopinath
Mahale, Hamsika
Goel, Arnav
Nandy, S. K.
Bhattacharya, S.
Narayan, Ranjani
2015 28TH INTERNATIONAL CONFERENCE ON VLSI DESIGN (VLSID), 2015, : 81 - 86
[48] Real Time Face Recognition using LBP Features
Kulkarni, O. S.
Deokar, S. M.
Chaudhari, A. K.
Patankar, S. S.
Kulkarni, J. V.
2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
[49] Implementation of real-time human face recognition
Liu, HS
Wu, MX
Cheng, G
Jin, GF
Yuan, SF
Yan, YB
ALGORITHMS, DEVICES, AND SYSTEMS FOR OPTICAL INFORMATION PROCESSING, 1997, 3159 : 292 - 299
[50] OPTICAL NETWORK FOR REAL-TIME FACE RECOGNITION
LI, HYS
QIAO, Y
PSALTIS, D
APPLIED OPTICS, 1993, 32 (26): : 5026 - 5035

← 1 2 3 4 5 →