Speaker dependent video indexing based on audio-visual interaction

被引：0

作者：

Tsekeridou, S ^{[1
]}

Pitas, I ^{[1
]}

机构：

[1] Univ Thessaloniki, Dept Informat, GR-54006 Salonika, Greece

来源：

1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1 | 1998年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A content-based video indexing method is presented in this paper that aims at temporally indexing a video sequence according to the actual speaker. This is achieved by the integration of audio and visual information. Audio analysis leads to the extraction of a speaker identity label versus time diagram. Visual analysis includes scene cut detection, face shot determination, mouth region extraction and tracking and finally talking face shot determination. Results from both sources are combined to improve speaker-dependent video indexing. Such a task enables flexible video retrieval or browsing in cases where queries according to speaker identities are imposed. Speaker recognition errors are reduced to 2%.

引用

页码：358 / 362

页数：5

共 50 条

[1] Content-based video parsing and indexing based on audio-visual interaction
Tsekeridou, S
Pitas, I
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (04) : 522 - 535
[2] Indexing audio-visual sequences by joint audio and video processing
Saraceno, C
Leonardi, R
[J]. VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 686 - 691
[3] Audio-visual content analysis for content-based video indexing
Tsekeridou, S
Pitas, I
[J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 667 - 672
[4] Audio-visual content analysis for content-based video indexing
Tsekeridou, Sofia
Pitas, Ioannis
[J]. International Conference on Multimedia Computing and Systems -Proceedings, 1999, 1 : 667 - 672
[5] Combining text and audio-visual features in video indexing
Chang, SF
Manmatha, R
Chua, TS
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1005 - 1008
[6] Audio-visual speaker recognition for video broadcast news
Maison, B
Neti, C
Senior, A
[J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2001, 29 (1-2): : 71 - 79
[7] Audio-Visual Speaker Recognition for Video Broadcast News
Benoît Maison
Chalapathy Neti
Andrew Senior
[J]. Journal of VLSI signal processing systems for signal, image and video technology, 2001, 29 : 71 - 79
[8] Audio-visual biometric based speaker identification
Kar, Biswajit
Bhatia, Sandeep
Dutta, P. K.
[J]. ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL IV, PROCEEDINGS, 2007, : 94 - 98
[9] Audio-visual speaker identification based on the use of dynamic audio and visual features
Fox, N
Reilly, RB
[J]. AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
[10] Audio-Visual Synchronisation for Speaker Diarisation
Garau, Giulia
Dielmann, Alfred
Bourlard, Herve
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2662 - +

← 1 2 3 4 5 →