Synchronization of Multiple Camera Videos Using Audio-Visual Features

被引:31
|
作者
Shrestha, Prarthana [1 ]
Barbieri, Mauro [1 ]
Weda, Hans [1 ]
Sekulovski, Dragan [1 ]
机构
[1] Philips Res Europe, NL-5656 AE Eindhoven, Netherlands
关键词
Content analysis and synthesis; feature extraction and representation; joint media and multimodal processing;
D O I
10.1109/TMM.2009.2036285
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Digital video capturing is getting popular with the decreasing price of camcorders and the increasing availability of devices with embedded video cameras such as digital-still cameras, mobile phones and PDAs. While a raw home video is considered as visually non-appealing, having multiple recordings of the same event provides the opportunity to combine audio and video segments from different cameras for improving quality and aesthetics. Mixing content from different recordings requires precise synchronization among the recordings. In most present applications, synchronization is done manually and considered as a very tedious task. In this paper, we propose a novel automated synchronization approach based on detecting and matching audio and video features extracted from the recorded content. We assess experimentally three realizations of this approach on a common data set and make recommendations on the usability of the different realizations in practical use cases. The realizations have no limitations on the number and movement of the cameras. Moreover, they are robust against various ambient noises and audio-visual artifacts occurring during the recordings.
引用
收藏
页码:79 / 92
页数:14
相关论文
共 50 条
  • [31] Audio-Visual Speech Synchronization Detection Using a Bimodal Linear Prediction Model
    Kumar, Kshitiz
    Navratil, Jiri
    Marcheret, Etienne
    Libal, Vit
    Ramaswamy, Ganesh
    Potamianos, Gerasimos
    [J]. 2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 670 - +
  • [32] Audio-visual speaker identification based on the use of dynamic audio and visual features
    Fox, N
    Reilly, RB
    [J]. AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
  • [33] Hierarchical discriminant features for audio-visual LVCSR
    Potamianos, G
    Luettin, J
    Neti, C
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 165 - 168
  • [34] Multimodal tracking and classification of audio-visual features
    Pavlovic, V
    [J]. 1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 343 - 347
  • [35] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [36] AUDIO-VISUAL VOICE CONVERSION USING NOISE-ROBUST FEATURES
    Sawada, Kohei
    Takehara, Masanori
    Tamura, Satoshi
    Hayamizu, Satoru
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [37] Audio-Visual Beamforming with the Eigenmike Microphone Array an Omni-Camera and Cognitive Auditory Features
    Mendat, Daniel R.
    West, James E.
    Ramenahalli, Sudarshan
    Niebur, Ernst
    Andreou, Andreas G.
    [J]. 2017 51ST ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2017,
  • [38] Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features
    Petar S. Aleksic
    Jay J. Williams
    Zhilin Wu
    Aggelos K. Katsaggelos
    [J]. EURASIP Journal on Advances in Signal Processing, 2002
  • [39] Audio-Visual Person Authentication with Multiple Visualized-Speech Features and Multiple Face Profiles
    Das, Amitava
    Manyam, Ohil K.
    Tapaswi, Makarand
    [J]. SIXTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS & IMAGE PROCESSING ICVGIP 2008, 2008, : 39 - 46
  • [40] Audio-Visual Detection of Multiple Chirping Robots
    Gribovskiy, Alexey
    Mondada, Francesco
    [J]. IAS-10: INTELLIGENT AUTONOMOUS SYSTEMS 10, 2008, : 324 - 331