Audio-video feature correlation:: Faces and speech

被引：1

作者：

Durand, G ^{[1
]}

Montacié, C ^{[1
]}

Caraty, MJ ^{[1
]}

Faudemay, P ^{[1
]}

机构：

[1] Univ Paris 06, Lab Informat Paris 6, F-75252 Paris 05, France

来源：

MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS IV | 1999年 / 3846卷

关键词：

speech analysis; face detection; audio-video joint analysis;

D O I：

10.1117/12.360415

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm was first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many case, and that significant benefits can be obtained from the joint use of audio and video analysis methods.

引用

页码：102 / 112

页数：11

共 50 条

[1] Audio to audio-video speech conversion with the help of phonetic knowledge integration
Bothe, HH
SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 1632 - 1637
[2] Audio-Video steganography
Kakde, Yugeshwari
Gonnade, Priyanka
Dahiwale, Prashant
2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
[3] Audiovisual Saliency Prediction in Uncategorized Video Sequences based on Audio-Video Correlation
Butt, Maryam Qamar
Rahman, Anis Ur
PERCEPTION, 2021, 50 (1_SUPPL) : 161 - 161
[4] Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions
Stewart, Darryl
Seymour, Rowan
Pass, Adrian
Ming, Ji
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (02) : 175 - 184
[5] Stereo vision lip-tracking for audio-video speech processing
Goecke, R
Millar, JB
Zelinsky, A
Robert-Ribes, J
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4030 - 4030
[6] SRAW: A Speech Rehabilitation Assistance Workbench - Speech ability evaluation by audio-video input
Yoshitaka, A
Katsuki, S
Ichikawa, T
Hirakawa, M
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 772 - 777
[7] SRAW: a speech rehabilitation assistance workbench - speech ability evaluation by audio-video input
Yoshitaka, Atsuo
Katsuki, Satoshi
Ichikawa, Tadao
Hirakawa, Masahito
International Conference on Multimedia Computing and Systems -Proceedings, 1999, 2 : 772 - 777
[8] Transcribing audio-video archives
Barras, C
Allauzen, A
Lamel, L
Gauvain, JL
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 13 - 16
[9] AUDIO-VIDEO TUTORIAL PROGRAM
SYROCKI, J
THOMAS, CS
FAIRCHILD, GC
AMERICAN BIOLOGY TEACHER, 1969, 31 (02): : 91 - +
[10] On the MPEG audio-video synchronization
Sung, CT
MULTIMEDIA HARDWARE ARCHITECTURES 1997, 1997, 3021 : 224 - 231

← 1 2 3 4 5 →