Audio-video feature correlation:: Faces and speech

被引:1
|
作者
Durand, G [1 ]
Montacié, C [1 ]
Caraty, MJ [1 ]
Faudemay, P [1 ]
机构
[1] Univ Paris 06, Lab Informat Paris 6, F-75252 Paris 05, France
关键词
speech analysis; face detection; audio-video joint analysis;
D O I
10.1117/12.360415
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm was first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many case, and that significant benefits can be obtained from the joint use of audio and video analysis methods.
引用
收藏
页码:102 / 112
页数:11
相关论文
共 50 条
  • [11] An Audio-video Summarization Scheme Based on Audio and Video Analysis
    Furini, Marco
    Ghini, Vittorio
    2006 3RD IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE, VOLS 1-3, 2006, : 1209 - +
  • [13] Audio-video integration for background modelling
    Cristani, M
    Bicego, M
    Murino, V
    COMPUTER VISION - ECCV 2004, PT 2, 2004, 3022 : 202 - 213
  • [14] AVATS: Audio-Video and Textual Synchronization
    Maini, Siddharth
    Rosen, Joshua
    Pierce, Marlon E.
    Fox, Geoffrey C.
    PROCEEDINGS OF THE 2009 INTERNATIONAL SYMPOSIUM ON COLLABORATIVE TECHNOLOGIES AND SYSTEMS, 2009, : 455 - 464
  • [15] Joint audio-video object tracking
    Spors, S
    Rabenstein, R
    Strobel, N
    2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2001, : 393 - 396
  • [16] Joint audio-video processing for multimedia
    Chen, T
    Rao, R
    PROCEEDINGS OF THE 1996 IEEE IECON - 22ND INTERNATIONAL CONFERENCE ON INDUSTRIAL ELECTRONICS, CONTROL, AND INSTRUMENTATION, VOLS 1-3, 1996, : 548 - 553
  • [17] Compression Algorithms for Audio-Video Streaming
    Rahman, Tarif Riyad
    Rahman, Miftahur
    UKSIM-AMSS FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, MODELLING AND SIMULATION, 2010, : 187 - 192
  • [18] MAViL: Masked Audio-Video Learners
    Huang, Po-Yao
    Sharma, Vasu
    Xu, Hu
    Ryali, Chaitanya
    Fan, Haoqi
    Li, Yanghao
    Li, Shang-Wen
    Ghosh, Gargi
    Malik, Jitendra
    Feichtenhofer, Christoph
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [19] An efficient audio-video synchronization methodology
    Yang, Ming
    Bourbakis, Nikolaos
    Chen, Zizhong
    Trifas, Monica
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 767 - +
  • [20] AntiQue audio-video in a digital age
    Van Horn, R
    PHI DELTA KAPPAN, 2002, 83 (05) : 347 - 348