Speaker dependent video indexing based on audio-visual interaction

被引:0
|
作者
Tsekeridou, S [1 ]
Pitas, I [1 ]
机构
[1] Univ Thessaloniki, Dept Informat, GR-54006 Salonika, Greece
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A content-based video indexing method is presented in this paper that aims at temporally indexing a video sequence according to the actual speaker. This is achieved by the integration of audio and visual information. Audio analysis leads to the extraction of a speaker identity label versus time diagram. Visual analysis includes scene cut detection, face shot determination, mouth region extraction and tracking and finally talking face shot determination. Results from both sources are combined to improve speaker-dependent video indexing. Such a task enables flexible video retrieval or browsing in cases where queries according to speaker identities are imposed. Speaker recognition errors are reduced to 2%.
引用
收藏
页码:358 / 362
页数:5
相关论文
共 50 条
  • [1] Content-based video parsing and indexing based on audio-visual interaction
    Tsekeridou, S
    Pitas, I
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (04) : 522 - 535
  • [2] Indexing audio-visual sequences by joint audio and video processing
    Saraceno, C
    Leonardi, R
    [J]. VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 686 - 691
  • [3] Audio-visual content analysis for content-based video indexing
    Tsekeridou, S
    Pitas, I
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 667 - 672
  • [4] Audio-visual content analysis for content-based video indexing
    Tsekeridou, Sofia
    Pitas, Ioannis
    [J]. International Conference on Multimedia Computing and Systems -Proceedings, 1999, 1 : 667 - 672
  • [5] Combining text and audio-visual features in video indexing
    Chang, SF
    Manmatha, R
    Chua, TS
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1005 - 1008
  • [6] Audio-visual speaker recognition for video broadcast news
    Maison, B
    Neti, C
    Senior, A
    [J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2001, 29 (1-2): : 71 - 79
  • [7] Audio-Visual Speaker Recognition for Video Broadcast News
    Benoît Maison
    Chalapathy Neti
    Andrew Senior
    [J]. Journal of VLSI signal processing systems for signal, image and video technology, 2001, 29 : 71 - 79
  • [8] Audio-visual biometric based speaker identification
    Kar, Biswajit
    Bhatia, Sandeep
    Dutta, P. K.
    [J]. ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL IV, PROCEEDINGS, 2007, : 94 - 98
  • [9] Audio-visual speaker identification based on the use of dynamic audio and visual features
    Fox, N
    Reilly, RB
    [J]. AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
  • [10] Audio-Visual Synchronisation for Speaker Diarisation
    Garau, Giulia
    Dielmann, Alfred
    Bourlard, Herve
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2662 - +