Fully automatic face recognition system using a combined audio-visual approach

被引:6
|
作者
Albiol, A [1 ]
Torres, L
Delp, EJ
机构
[1] Univ Politecn Valencia, Dept Commun, Valencia, Spain
[2] Tech Univ Catalonia, Dept Signal Theory & Commun, Barcelona, Spain
[3] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
来源
关键词
D O I
10.1049/ip-vis:20045082
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a novel audio and video information fusion approach that greatly improves automatic recognition of people in video sequences. To that end, audio and video information is first used independently to obtain confidence values that indicate the likelihood that a specific person appears in a video shot. Finally, a post-classifier is applied to fuse audio and visual confidence values. The system has been tested on several newssequences and the results indicate that a significant improvement in the recognition rate can be achieved when both modalities are used together.
引用
收藏
页码:318 / 326
页数:9
相关论文
共 50 条
  • [31] An audio-visual corpus for speech perception and automatic speech recognition (L)
    Cooke, Martin
    Barker, Jon
    Cunningham, Stuart
    Shao, Xu
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05): : 2421 - 2424
  • [32] AUDIO-VISUAL EMOTION RECOGNITION USING BOLTZMANN ZIPPERS
    Lu, Kun
    Jia, Yunde
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 2589 - 2592
  • [33] Audio-visual speech recognition using deep learning
    Noda, Kuniaki
    Yamaguchi, Yuki
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    Ogata, Tetsuya
    APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
  • [34] Early visual deprivation affects the development of face recognition and of audio-visual speech perception
    Putzar, Lisa
    Hoetting, Kirsten
    Roeder, Brigitte
    RESTORATIVE NEUROLOGY AND NEUROSCIENCE, 2010, 28 (02) : 251 - 257
  • [35] Audio-visual speech recognition using deep learning
    Kuniaki Noda
    Yuki Yamaguchi
    Kazuhiro Nakadai
    Hiroshi G. Okuno
    Tetsuya Ogata
    Applied Intelligence, 2015, 42 : 722 - 737
  • [36] Audio-visual speech recognition using an infrared headset
    Huang, J
    Potamianos, G
    Connell, J
    Neti, C
    SPEECH COMMUNICATION, 2004, 44 (1-4) : 83 - 96
  • [37] Audio-Visual (Multimodal) Speech Recognition System Using Deep Neural Network
    Paulin, Hebsibah
    Milton, R. S.
    JanakiRaman, S.
    Chandraprabha, K.
    JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 3963 - 3974
  • [38] Audio-Visual Emotion Recognition System Using Multi-Modal Features
    Handa, Anand
    Agarwal, Rashi
    Kohli, Narendra
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [39] An audio-visual approach to simultaneous-speaker speech recognition
    Patterson, EK
    Gowdy, JN
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 780 - 783
  • [40] USING MULTIPLE VISUAL TANDEM STREAMS IN AUDIO-VISUAL SPEECH RECOGNITION
    Topkaya, Ibrahim Saygin
    Erdogan, Hakan
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4988 - 4991