Audio-Visual Speaker Recognition for Video Broadcast News

被引:0
|
作者
Benoît Maison
Chalapathy Neti
Andrew Senior
机构
[1] IBM Thomas J. Watson Research Center,
关键词
speaker identification; face recognition; multimodal; fusion; broadcast news;
D O I
暂无
中图分类号
学科分类号
摘要
Audio-based speaker identification degrades severely when there is a mismatch between training and test conditions due either to channel or to noise. In this paper, we explore various techniques to combine video based speaker identification with audio-based speaker identification to improve the performance under mismatched conditions. Specifically, we explore techniques to optimally determine the relative weights of the independent decisions based on audio and video to achieve the best combination. Experiments on video broadcast news data show that significant improvements can be achieved by the fusion in acoustically degraded conditions.
引用
收藏
页码:71 / 79
页数:8
相关论文
共 50 条
  • [1] Audio-visual speaker recognition for video broadcast news
    Maison, B
    Neti, C
    Senior, A
    [J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2001, 29 (1-2): : 71 - 79
  • [2] Speaker independent audio-visual speech recognition
    Zhang, Y
    Levinson, S
    Huang, T
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1073 - 1076
  • [3] Multifactor fusion for audio-visual speaker recognition
    Chetty, Girija
    Tran, Dat
    [J]. LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 70 - +
  • [4] Audio-visual system for robust speaker recognition
    Chen, Q
    Yang, JG
    Gou, J
    [J]. MLMTA '05: Proceedings of the International Conference on Machine Learning Models Technologies and Applications, 2005, : 97 - 103
  • [5] Speaker independent audio-visual continuous speech recognition
    Liang, LH
    Liu, XX
    Zhao, YB
    Pi, XB
    Nefian, AV
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A25 - A28
  • [6] Audio-Visual Speech Recognition in the Presence of a Competing Speaker
    Shao, Xu
    Barker, Jon
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1292 - 1295
  • [7] Speaker and digit recognition by audio-visual lip biometrics
    Faraj, Maycel Isaac
    Bigun, Josef
    [J]. ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 1016 - +
  • [8] Dynamic Bayesian Networks for audio-visual speaker recognition
    Li, DD
    Yang, YC
    Wu, ZH
    [J]. ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 539 - 545
  • [9] Audio-Visual Multilevel Fusion for Speech and Speaker Recognition
    Chetty, Girija
    Wagner, Michael
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 379 - 382
  • [10] Audio-Visual Emotion Recognition in Video Clips
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75