Fully automatic face recognition system using a combined audio-visual approach

被引:6
|
作者
Albiol, A [1 ]
Torres, L
Delp, EJ
机构
[1] Univ Politecn Valencia, Dept Commun, Valencia, Spain
[2] Tech Univ Catalonia, Dept Signal Theory & Commun, Barcelona, Spain
[3] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
来源
关键词
D O I
10.1049/ip-vis:20045082
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a novel audio and video information fusion approach that greatly improves automatic recognition of people in video sequences. To that end, audio and video information is first used independently to obtain confidence values that indicate the likelihood that a specific person appears in a video shot. Finally, a post-classifier is applied to fuse audio and visual confidence values. The system has been tested on several newssequences and the results indicate that a significant improvement in the recognition rate can be achieved when both modalities are used together.
引用
收藏
页码:318 / 326
页数:9
相关论文
共 50 条
  • [21] Audio-visual affect recognition
    Zeng, Zhihong
    Tu, Jilin
    Liu, Ming
    Huang, Thomas S.
    Pianfetti, Brian
    Roth, Dan
    Levinson, Stephen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 424 - 428
  • [22] Audio-visual gender recognition
    Liu, Ming
    Xu, Xun
    Huang, Thomas S.
    MIPPR 2007: PATTERN RECOGNITION AND COMPUTER VISION, 2007, 6788
  • [23] Lips Detection for Audio-Visual Speech Recognition System
    Chin, Siew Wen
    Ang, Li-Minn
    Seng, Kah Phooi
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS 2008), 2008, : 311 - 314
  • [24] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
    Su, Rongfeng
    Wang, Lan
    Liu, Xunying
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
  • [25] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [26] An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition
    Yoshida, Takami
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS, 2010, 6096 : 51 - +
  • [27] Audio-visual speech recognition using lstm and cnn
    El Maghraby E.E.
    Gody A.M.
    Farouk M.H.
    Recent Advances in Computer Science and Communications, 2021, 14 (06) : 2023 - 2039
  • [28] Indonesian Audio-Visual Speech Corpus for Multimodal Automatic Speech Recognition
    Maulana, Muhammad Rizki Aulia Rahman
    Fanany, Mohamad Ivan
    2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2017, : 381 - 385
  • [29] Audio-visual talking face detection
    Li, MK
    Li, DG
    Dimitrova, N
    Sethi, I
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, 2003, : 473 - 476
  • [30] Audio-Visual Group Recognition Using Diffusion Maps
    Keller, Yosi
    Coifman, Ronald R.
    Lafon, Stephane
    Zucker, Steven W.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (01) : 403 - 413