Real time face detection for multimodal speech recognition

被引:0
|
作者
Murai, K [1 ]
Nakamura, S [1 ]
机构
[1] Fuji Xerox, Informat Media Lab, Kanagawa 2590157, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a real time system to detect the speaker's frontal face for multimodal speech recognition. It is widely acknowledged that automatic speech recognizers, as well as humans, can improve recognition performance by adding visual modality, i.e., the speaker's facial image to audio modality([1][2]). Visual modality also provides inaudible information, such as the speaker's facial orientation([3]), and the location of the mouth. To acquire this information, we have to localize the speaker's face in real time. Our system is a combination of skin color detection and spatial feature detection. The color-based detection is fast but depends on the skin and the background color, while the special feature detection requires more computation. We applied color-based pruning to reduce the search space for the spatial feature detection. By detecting the facial orientation, the proposed method functions as a "Face to Talk" switch in place of the "Push to Talk" switch. In our experiment, pruning based on color reduced 53-97% of the search space, and 98.9% of the frontal face was detected correctly by the subsequent spatial detector.
引用
收藏
页码:A373 / A376
页数:4
相关论文
共 50 条
  • [41] Multimodal systems for speech recognition
    Mamyrbayev, Orken Zh
    Alimhan, Keylan
    Amirgaliyev, Beibut
    Zhumazhanov, Bagashar
    Mussayeva, Dinara
    Gusmanova, Farida
    INTERNATIONAL JOURNAL OF MOBILE COMMUNICATIONS, 2020, 18 (03) : 314 - 326
  • [42] Multimodal recognition of speech and electrocorticogram
    Ahuja, Mitali
    Komeiji, Shuji
    Mitsuhashi, Takumi
    Iimura, Yasushi
    Suzuki, Hiroharu
    Sugano, Hidenori
    Shinoda, Koichi
    Tanaka, Toshihisa
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 546 - 550
  • [43] Integration of Face Detection and User Identification with Visual Speech Recognition
    Sagheer, Alaa
    Aly, Saleh
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT V, 2012, 7667 : 479 - 487
  • [44] Real time noise-speech discrimination in time domain for speech recognition application
    Mokhtar, N.
    Arof, H.
    Adikan, F. R. Mahamd
    Mubin, M.
    SCIENTIFIC RESEARCH AND ESSAYS, 2011, 6 (01): : 18 - 22
  • [45] A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face
    Lian, Hailun
    Lu, Cheng
    Li, Sunan
    Zhao, Yan
    Tang, Chuangao
    Zong, Yuan
    ENTROPY, 2023, 25 (10)
  • [46] Real-time face recognition using eigenfaces
    Cendrillon, R
    Lovell, BC
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2000, PTS 1-3, 2000, 4067 : 269 - 276
  • [47] Hardware Solution For Real-time Face Recognition
    Mahale, Gopinath
    Mahale, Hamsika
    Goel, Arnav
    Nandy, S. K.
    Bhattacharya, S.
    Narayan, Ranjani
    2015 28TH INTERNATIONAL CONFERENCE ON VLSI DESIGN (VLSID), 2015, : 81 - 86
  • [48] Real Time Face Recognition using LBP Features
    Kulkarni, O. S.
    Deokar, S. M.
    Chaudhari, A. K.
    Patankar, S. S.
    Kulkarni, J. V.
    2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
  • [49] Implementation of real-time human face recognition
    Liu, HS
    Wu, MX
    Cheng, G
    Jin, GF
    Yuan, SF
    Yan, YB
    ALGORITHMS, DEVICES, AND SYSTEMS FOR OPTICAL INFORMATION PROCESSING, 1997, 3159 : 292 - 299
  • [50] OPTICAL NETWORK FOR REAL-TIME FACE RECOGNITION
    LI, HYS
    QIAO, Y
    PSALTIS, D
    APPLIED OPTICS, 1993, 32 (26): : 5026 - 5035