Audio-video feature correlation:: Faces and speech

被引:1
|
作者
Durand, G [1 ]
Montacié, C [1 ]
Caraty, MJ [1 ]
Faudemay, P [1 ]
机构
[1] Univ Paris 06, Lab Informat Paris 6, F-75252 Paris 05, France
关键词
speech analysis; face detection; audio-video joint analysis;
D O I
10.1117/12.360415
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm was first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many case, and that significant benefits can be obtained from the joint use of audio and video analysis methods.
引用
收藏
页码:102 / 112
页数:11
相关论文
共 50 条
  • [31] Audio-Video detection of the active speaker in meetings
    Madrigal, Francisco
    Lerasle, Frederic
    Pibre, Lionel
    Ferrane, Isabelle
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2536 - 2543
  • [32] COLLABORATIVE LEARNING TO GENERATE AUDIO-VIDEO JOINTLY
    Kurmi, Vinod K.
    Bajaj, Vipul
    Patro, Badri N.
    Venkatesh, K. S.
    Namboodiri, Vinay P.
    Jyothi, Preethi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4180 - 4184
  • [33] NO CAUSE FOR JUBILATION AT BERLIN AUDIO-VIDEO FAIR
    GOSCH, J
    ELECTRONICS, 1985, 58 (34): : 34 - &
  • [34] Unsupervised news video segmentation by combined audio-video analysis
    De Santo, M.
    Percannella, G.
    Sansone, C.
    Vento, M.
    MULTIMEDIA CONTENT REPRESENTATION, CLASSIFICATION AND SECURITY, 2006, 4105 : 273 - 281
  • [35] Music video emotion classification using slow-fast audio-video network and unsupervised feature representation
    Pandeya, Yagya Raj
    Bhattarai, Bhuwan
    Lee, Joonwhoan
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [36] Parsing News video using integrated audio-video features
    Krishna, SK
    Subbarao, R
    Chaudhury, S
    Kumar, A
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 538 - 543
  • [37] INTEL-IBMS AUDIO-VIDEO KERNEL
    DONOVAN, JW
    BYTE, 1991, 16 (13): : 177 - &
  • [38] AUDIO-VIDEO TECHNOLOGIES IN LEARNING SOCIAL PROBLEMS
    Pervova, Irina L.
    Kelasyev, Viacheslav N.
    6TH INTERNATIONAL CONFERENCE OF EDUCATION, RESEARCH AND INNOVATION (ICERI 2013), 2013, : 6948 - 6952
  • [39] MODERN AUDIO-VIDEO MEANS AT EXHIBITIONS - REVIEW
    GOSUDAREV, VK
    PETELIN, VG
    KHROMOV, LN
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 1-ORGANIZATSIYA I METODIKA INFORMATSIONNOI RABOTY, 1983, (04): : 11 - 15
  • [40] ALife for Real and Virtual Audio-Video Performances
    Pagliarini, Luigi
    Lund, Henrik Hautop
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB 2014), 2014, : 5 - 9