Audio-video feature correlation:: Faces and speech

被引:1
|
作者
Durand, G [1 ]
Montacié, C [1 ]
Caraty, MJ [1 ]
Faudemay, P [1 ]
机构
[1] Univ Paris 06, Lab Informat Paris 6, F-75252 Paris 05, France
关键词
speech analysis; face detection; audio-video joint analysis;
D O I
10.1117/12.360415
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm was first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many case, and that significant benefits can be obtained from the joint use of audio and video analysis methods.
引用
收藏
页码:102 / 112
页数:11
相关论文
共 50 条
  • [1] Audio to audio-video speech conversion with the help of phonetic knowledge integration
    Bothe, HH
    SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 1632 - 1637
  • [2] Audio-Video steganography
    Kakde, Yugeshwari
    Gonnade, Priyanka
    Dahiwale, Prashant
    2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [3] Audiovisual Saliency Prediction in Uncategorized Video Sequences based on Audio-Video Correlation
    Butt, Maryam Qamar
    Rahman, Anis Ur
    PERCEPTION, 2021, 50 (1_SUPPL) : 161 - 161
  • [4] Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions
    Stewart, Darryl
    Seymour, Rowan
    Pass, Adrian
    Ming, Ji
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (02) : 175 - 184
  • [5] Stereo vision lip-tracking for audio-video speech processing
    Goecke, R
    Millar, JB
    Zelinsky, A
    Robert-Ribes, J
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4030 - 4030
  • [6] SRAW: A Speech Rehabilitation Assistance Workbench - Speech ability evaluation by audio-video input
    Yoshitaka, A
    Katsuki, S
    Ichikawa, T
    Hirakawa, M
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 772 - 777
  • [7] SRAW: a speech rehabilitation assistance workbench - speech ability evaluation by audio-video input
    Yoshitaka, Atsuo
    Katsuki, Satoshi
    Ichikawa, Tadao
    Hirakawa, Masahito
    International Conference on Multimedia Computing and Systems -Proceedings, 1999, 2 : 772 - 777
  • [8] Transcribing audio-video archives
    Barras, C
    Allauzen, A
    Lamel, L
    Gauvain, JL
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 13 - 16
  • [9] AUDIO-VIDEO TUTORIAL PROGRAM
    SYROCKI, J
    THOMAS, CS
    FAIRCHILD, GC
    AMERICAN BIOLOGY TEACHER, 1969, 31 (02): : 91 - +
  • [10] On the MPEG audio-video synchronization
    Sung, CT
    MULTIMEDIA HARDWARE ARCHITECTURES 1997, 1997, 3021 : 224 - 231