Fusion of Speech, Faces and Text for Person Identification in TV Broadcast

被引：0

作者：

Bredin, Herve ^{[1
]}

Poignant, Johann ^{[2
]}

Tapaswi, Makarand ^{[3
]}

Fortier, Guillaume ^{[4
]}

Viet Bac Le ^{[5
]}

Napoleon, Thibault ^{[6
]}

Gao, Hua ^{[3
]}

Barras, Claude ^{[1
]}

Rosset, Sophie ^{[1
]}

Besacier, Laurent ^{[2
]}

Verbeek, Jakob ^{[4
]}

Quenot, Georges ^{[2
]}

Jurie, Frederic ^{[6
]}

Ekenel, Hazim Kemal ^{[3
]}

机构：

[1] Univ Paris 11, CNRS, UPR 3251, LIMSI, BP 133, F-91403 Orsay, France

[2] UJF Grenoble 1, UPMF Grenoble 2, Grenoble INP, CNRS,UMR 5217,LIG, F-38041 Grenoble, France

[3] Karlsruher Inst Technol, Karlsruhe, Germany

[4] INRIA Rhone Alpes, F-38330 Montbonnot St Martin, France

[5] Vocapia Res, F-91400 Orsay, France

[6] Univ Caen, GREYC, UMR 6072, F-14050 Caen, France

来源：

COMPUTER VISION - ECCV 2012, PT III | 2012年 / 7585卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through video optical character recognition. Moreover, we show that unsupervised multimodal person recognition systems can achieve performance nearly as good as supervised monomodal ones (with several hundreds of identity models).

引用

页码：385 / 394

页数：10

共 50 条

[21] Person identification system for future digital TV with intelligence
Hwang, Min-Cheol
Ha, Le Thanh
Kim, Nam-Hyeong
Park, Chun-Su
Ko, Sung-Jea
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2007, 53 (01) : 218 - 226
[22] Person identification through faces and voices: An ERP study
Quinones Gonzalez, Ileana
Bobes Leon, Maria Antonieta
Belin, Pascal
Martinez-Quintana, Yaiselene
Galan Garcia, Lidice
Sanchez Castillo, Manuel
BRAIN RESEARCH, 2011, 1407 : 13 - 26
[23] Spoken language recognition in conversational telephone speech and TV broadcast news (GLOSA)
Javier Rodriguez-Fuentes, Luis
Varona, Amparo
Penagarikano, Mikel
Diez, Mireia
Bordel, German
PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 349 - 350
[24] Model-Based Speech/Non-Speech Segmentation of a Heterogeneous Multilingual TV Broadcast Collection
Desplanques, Brecht
Martens, Jean-Pierre
2013 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS), 2013, : 55 - 60
[25] How important are faces for person re-identification?
Dietlmeier, Julia
Antony, Joseph
McGuinness, Kevin
O'Connor, Noel E.
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6912 - 6919
[26] Person identification using text and image data
Bolme, David S.
Beveridge, J. Ross
Howe, Adele E.
2007 FIRST IEEE INTERNATIONAL CONFERENCE ON BIOMETRICS: THEORY, APPLICATIONS AND SYSTEMS, 2007, : 253 - 258
[27] Automatic Speech-to-Text Transcription in an Ecuadorian Radio Broadcast Context
Sigcha, Erik
Medina, Jose
Vega, Francisco
Saquicela, Victor
Espinoza, Mauricio
ADVANCES IN COMPUTING, CCC 2017, 2017, 735 : 695 - 709
[28] Fusion of face and speech data for person identity verification
Ben-Yacoub, S
Abdeljaoued, Y
Mayoraz, E
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05): : 1065 - 1074
[29] Speech to Text User Assistive Agent System for Impaired Person
Saranya, E.
Sam, B. Baron
Sethuraman, R.
2017 IEEE INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES AND MANAGEMENT FOR COMPUTING, COMMUNICATION, CONTROLS, ENERGY AND MATERIALS (ICSTM), 2017, : 221 - 226
[30] Comparison of Two Methods for Unsupervised Person Identification in TV Shows
Gay, Paul
Dupuy, Gregor
Lailler, Carole
Odobez, Jean-Marc
Meignier, Sylvain
Deleglise, Paul
2014 12TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2014,

← 1 2 3 4 5 →