Fusion of Speech, Faces and Text for Person Identification in TV Broadcast

被引：0

作者：

Bredin, Herve ^{[1
]}

Poignant, Johann ^{[2
]}

Tapaswi, Makarand ^{[3
]}

Fortier, Guillaume ^{[4
]}

Viet Bac Le ^{[5
]}

Napoleon, Thibault ^{[6
]}

Gao, Hua ^{[3
]}

Barras, Claude ^{[1
]}

Rosset, Sophie ^{[1
]}

Besacier, Laurent ^{[2
]}

Verbeek, Jakob ^{[4
]}

Quenot, Georges ^{[2
]}

Jurie, Frederic ^{[6
]}

Ekenel, Hazim Kemal ^{[3
]}

机构：

[1] Univ Paris 11, CNRS, UPR 3251, LIMSI, BP 133, F-91403 Orsay, France

[2] UJF Grenoble 1, UPMF Grenoble 2, Grenoble INP, CNRS,UMR 5217,LIG, F-38041 Grenoble, France

[3] Karlsruher Inst Technol, Karlsruhe, Germany

[4] INRIA Rhone Alpes, F-38330 Montbonnot St Martin, France

[5] Vocapia Res, F-91400 Orsay, France

[6] Univ Caen, GREYC, UMR 6072, F-14050 Caen, France

来源：

COMPUTER VISION - ECCV 2012, PT III | 2012年 / 7585卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through video optical character recognition. Moreover, we show that unsupervised multimodal person recognition systems can achieve performance nearly as good as supervised monomodal ones (with several hundreds of identity models).

引用

页码：385 / 394

页数：10

共 50 条

[31] DIF : Dataset of Perceived Intoxicated Faces for Drunk Person Identification
Mehta, Vineet
Yadav, Devendra Pratap
Katta, Sai Srinadhu
Dhall, Abhinav
ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 367 - 374
[32] Person identification based on multichannel and multimodality fusion
Liu, Ming
Tang, Hao
Ning, Huazhong
Huang, Thomas
Multimodal Technologies for Perception of Humans, 2007, 4122 : 241 - 248
[33] Fusion of Fingerprint, Palmprint and Iris for Person Identification
Patil, Archana P.
Bhalke, D. G.
2016 INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL AND DYNAMIC OPTIMIZATION TECHNIQUES (ICACDOT), 2016, : 960 - 963
[34] Person Re-identification by Features Fusion
Wan Xin
Ge Dongdong
Li Peng
Ji Zhe
2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 285 - 289
[35] Multimodal Speaker Identification Based on Text and Speech
Moschonas, Panagiotis
Kotropoulos, Constantine
BIOMETRICS AND IDENTITY MANAGEMENT, 2008, 5372 : 100 - 109
[36] Text analysis and language identification for polyglot text-to-speech synthesis
Romsdorfer, Harald
Pfister, Beat
SPEECH COMMUNICATION, 2007, 49 (09) : 697 - 724
[37] Efficient Portable Camera Based Text to Speech Converter for Blind Person
Shah, Trupti
Parshionikar, Sangeeta
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT SUSTAINABLE SYSTEMS (ICISS 2019), 2019, : 353 - 358
[38] Audio and Text Synchronization for TV news Subtitling based on Automatic Speech Recognition
Enrique Garcia, Jose
Ortega, Alfonso
Lleida, Eduardo
Lozano, Tomas
Bernues, Emiliano
Sanchez, Daniel
BMSB: 2009 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, VOLS 1 AND 2, 2009, : 277 - +
[39] Automatic propagation of manual annotations for multimodal person identification in TV shows
Budnik, Mateusz
Poignant, Johann
Besacier, Laurent
Quenot, Georges
2014 12TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2014,
[40] Real-time person identification system for intelligent digital TV
Hwang, Min-Cheol
Ha, Le Thanh
Kim, Seung-Kyun
Ko, Sung-Jea
ICCE: 2007 DIGEST OF TECHNICAL PAPERS INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, 2007, : 103 - +

← 1 2 3 4 5 →