Fusion of Speech, Faces and Text for Person Identification in TV Broadcast

被引:0
|
作者
Bredin, Herve [1 ]
Poignant, Johann [2 ]
Tapaswi, Makarand [3 ]
Fortier, Guillaume [4 ]
Viet Bac Le [5 ]
Napoleon, Thibault [6 ]
Gao, Hua [3 ]
Barras, Claude [1 ]
Rosset, Sophie [1 ]
Besacier, Laurent [2 ]
Verbeek, Jakob [4 ]
Quenot, Georges [2 ]
Jurie, Frederic [6 ]
Ekenel, Hazim Kemal [3 ]
机构
[1] Univ Paris 11, CNRS, UPR 3251, LIMSI, BP 133, F-91403 Orsay, France
[2] UJF Grenoble 1, UPMF Grenoble 2, Grenoble INP, CNRS,UMR 5217,LIG, F-38041 Grenoble, France
[3] Karlsruher Inst Technol, Karlsruhe, Germany
[4] INRIA Rhone Alpes, F-38330 Montbonnot St Martin, France
[5] Vocapia Res, F-91400 Orsay, France
[6] Univ Caen, GREYC, UMR 6072, F-14050 Caen, France
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through video optical character recognition. Moreover, we show that unsupervised multimodal person recognition systems can achieve performance nearly as good as supervised monomodal ones (with several hundreds of identity models).
引用
收藏
页码:385 / 394
页数:10
相关论文
共 50 条
  • [31] DIF : Dataset of Perceived Intoxicated Faces for Drunk Person Identification
    Mehta, Vineet
    Yadav, Devendra Pratap
    Katta, Sai Srinadhu
    Dhall, Abhinav
    ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 367 - 374
  • [32] Person identification based on multichannel and multimodality fusion
    Liu, Ming
    Tang, Hao
    Ning, Huazhong
    Huang, Thomas
    Multimodal Technologies for Perception of Humans, 2007, 4122 : 241 - 248
  • [33] Fusion of Fingerprint, Palmprint and Iris for Person Identification
    Patil, Archana P.
    Bhalke, D. G.
    2016 INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL AND DYNAMIC OPTIMIZATION TECHNIQUES (ICACDOT), 2016, : 960 - 963
  • [34] Person Re-identification by Features Fusion
    Wan Xin
    Ge Dongdong
    Li Peng
    Ji Zhe
    2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 285 - 289
  • [35] Multimodal Speaker Identification Based on Text and Speech
    Moschonas, Panagiotis
    Kotropoulos, Constantine
    BIOMETRICS AND IDENTITY MANAGEMENT, 2008, 5372 : 100 - 109
  • [36] Text analysis and language identification for polyglot text-to-speech synthesis
    Romsdorfer, Harald
    Pfister, Beat
    SPEECH COMMUNICATION, 2007, 49 (09) : 697 - 724
  • [37] Efficient Portable Camera Based Text to Speech Converter for Blind Person
    Shah, Trupti
    Parshionikar, Sangeeta
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT SUSTAINABLE SYSTEMS (ICISS 2019), 2019, : 353 - 358
  • [38] Audio and Text Synchronization for TV news Subtitling based on Automatic Speech Recognition
    Enrique Garcia, Jose
    Ortega, Alfonso
    Lleida, Eduardo
    Lozano, Tomas
    Bernues, Emiliano
    Sanchez, Daniel
    BMSB: 2009 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, VOLS 1 AND 2, 2009, : 277 - +
  • [39] Automatic propagation of manual annotations for multimodal person identification in TV shows
    Budnik, Mateusz
    Poignant, Johann
    Besacier, Laurent
    Quenot, Georges
    2014 12TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2014,
  • [40] Real-time person identification system for intelligent digital TV
    Hwang, Min-Cheol
    Ha, Le Thanh
    Kim, Seung-Kyun
    Ko, Sung-Jea
    ICCE: 2007 DIGEST OF TECHNICAL PAPERS INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, 2007, : 103 - +