Audio-Visual Speaker Recognition for Video Broadcast News

被引：0

作者：

Benoît Maison

Chalapathy Neti

Andrew Senior

机构：

[1] IBM Thomas J. Watson Research Center,

来源：

Journal of VLSI signal processing systems for signal, image and video technology | 2001年 / 29卷

关键词：

speaker identification; face recognition; multimodal; fusion; broadcast news;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Audio-based speaker identification degrades severely when there is a mismatch between training and test conditions due either to channel or to noise. In this paper, we explore various techniques to combine video based speaker identification with audio-based speaker identification to improve the performance under mismatched conditions. Specifically, we explore techniques to optimally determine the relative weights of the independent decisions based on audio and video to achieve the best combination. Experiments on video broadcast news data show that significant improvements can be achieved by the fusion in acoustically degraded conditions.

引用

页码：71 / 79

页数：8

共 50 条

[41] Video clip recognition using joint audio-visual processing model
Kulesh, V
Petrushin, VA
Sethi, IK
[J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL I, PROCEEDINGS, 2002, : 500 - 503
[42] Audio-visual speaker identification based on the use of dynamic audio and visual features
Fox, N
Reilly, RB
[J]. AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
[43] Video clip recognition using joint audio-visual processing model
Kulesh, Victor
Petrushin, Valery A.
Sethi, Ishwar K.
[J]. Proceedings - International Conference on Pattern Recognition, 2002, 16 (01): : 500 - 503
[44] Integrating audio-visual features and text information for story segmentation of news video
Liu, Hua-Yong
Zhou, Dong-Ru
[J]. Wuhan University Journal of Natural Sciences, 2003, 8 (04) : 1070 - 1074
[45] Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video
Liu Hua-yong
[J]. Wuhan University Journal of Natural Sciences, 2003, (04) : 1070 - 1074
[46] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
Tariquzzaman, Md.
Kim, Jin Young
Na, Seung You
Kim, Hyoung-Gook
Har, Dongsoo
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055
[47] Transcribing broadcast news for audio and video indexing
Gauvain, JL
Lamel, L
Adda, G
[J]. COMMUNICATIONS OF THE ACM, 2000, 43 (02) : 64 - 70
[48] Audio-visual Speaker Recognition via Multi-modal Correlated Neural Networks
Geng, Jiajia
Liu, Xin
Cheung, Yiu-ming
[J]. 2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE WORKSHOPS (WIW 2016), 2016, : 123 - 128
[49] Audio-visual speaker recognition using time-varying stream reliability prediction
Chaudhari, UV
Ramaswamy, GN
Potamianos, G
Neti, C
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 712 - 715
[50] AVA ACTIVE SPEAKER: AN AUDIO-VISUAL DATASET FOR ACTIVE SPEAKER DETECTION
Roth, Joseph
Chaudhuri, Sourish
Klejch, Ondrej
Marvin, Radhika
Gallagher, Andrew
Kaver, Liat
Ramaswamy, Sharadh
Stopczynski, Arkadiusz
Schmid, Cordelia
Xi, Zhonghua
Pantofaru, Caroline
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4492 - 4496

← 1 2 3 4 5 →