Fusion of Speech, Faces and Text for Person Identification in TV Broadcast

被引:0
|
作者
Bredin, Herve [1 ]
Poignant, Johann [2 ]
Tapaswi, Makarand [3 ]
Fortier, Guillaume [4 ]
Viet Bac Le [5 ]
Napoleon, Thibault [6 ]
Gao, Hua [3 ]
Barras, Claude [1 ]
Rosset, Sophie [1 ]
Besacier, Laurent [2 ]
Verbeek, Jakob [4 ]
Quenot, Georges [2 ]
Jurie, Frederic [6 ]
Ekenel, Hazim Kemal [3 ]
机构
[1] Univ Paris 11, CNRS, UPR 3251, LIMSI, BP 133, F-91403 Orsay, France
[2] UJF Grenoble 1, UPMF Grenoble 2, Grenoble INP, CNRS,UMR 5217,LIG, F-38041 Grenoble, France
[3] Karlsruher Inst Technol, Karlsruhe, Germany
[4] INRIA Rhone Alpes, F-38330 Montbonnot St Martin, France
[5] Vocapia Res, F-91400 Orsay, France
[6] Univ Caen, GREYC, UMR 6072, F-14050 Caen, France
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through video optical character recognition. Moreover, we show that unsupervised multimodal person recognition systems can achieve performance nearly as good as supervised monomodal ones (with several hundreds of identity models).
引用
收藏
页码:385 / 394
页数:10
相关论文
共 50 条
  • [21] Person identification system for future digital TV with intelligence
    Hwang, Min-Cheol
    Ha, Le Thanh
    Kim, Nam-Hyeong
    Park, Chun-Su
    Ko, Sung-Jea
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2007, 53 (01) : 218 - 226
  • [22] Person identification through faces and voices: An ERP study
    Quinones Gonzalez, Ileana
    Bobes Leon, Maria Antonieta
    Belin, Pascal
    Martinez-Quintana, Yaiselene
    Galan Garcia, Lidice
    Sanchez Castillo, Manuel
    BRAIN RESEARCH, 2011, 1407 : 13 - 26
  • [23] Spoken language recognition in conversational telephone speech and TV broadcast news (GLOSA)
    Javier Rodriguez-Fuentes, Luis
    Varona, Amparo
    Penagarikano, Mikel
    Diez, Mireia
    Bordel, German
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 349 - 350
  • [24] Model-Based Speech/Non-Speech Segmentation of a Heterogeneous Multilingual TV Broadcast Collection
    Desplanques, Brecht
    Martens, Jean-Pierre
    2013 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS), 2013, : 55 - 60
  • [25] How important are faces for person re-identification?
    Dietlmeier, Julia
    Antony, Joseph
    McGuinness, Kevin
    O'Connor, Noel E.
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6912 - 6919
  • [26] Person identification using text and image data
    Bolme, David S.
    Beveridge, J. Ross
    Howe, Adele E.
    2007 FIRST IEEE INTERNATIONAL CONFERENCE ON BIOMETRICS: THEORY, APPLICATIONS AND SYSTEMS, 2007, : 253 - 258
  • [27] Automatic Speech-to-Text Transcription in an Ecuadorian Radio Broadcast Context
    Sigcha, Erik
    Medina, Jose
    Vega, Francisco
    Saquicela, Victor
    Espinoza, Mauricio
    ADVANCES IN COMPUTING, CCC 2017, 2017, 735 : 695 - 709
  • [28] Fusion of face and speech data for person identity verification
    Ben-Yacoub, S
    Abdeljaoued, Y
    Mayoraz, E
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05): : 1065 - 1074
  • [29] Speech to Text User Assistive Agent System for Impaired Person
    Saranya, E.
    Sam, B. Baron
    Sethuraman, R.
    2017 IEEE INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES AND MANAGEMENT FOR COMPUTING, COMMUNICATION, CONTROLS, ENERGY AND MATERIALS (ICSTM), 2017, : 221 - 226
  • [30] Comparison of Two Methods for Unsupervised Person Identification in TV Shows
    Gay, Paul
    Dupuy, Gregor
    Lailler, Carole
    Odobez, Jean-Marc
    Meignier, Sylvain
    Deleglise, Paul
    2014 12TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2014,