Synchronization of Multiple Camera Videos Using Audio-Visual Features

被引：31

作者：

Shrestha, Prarthana ^{[1
]}

Barbieri, Mauro ^{[1
]}

Weda, Hans ^{[1
]}

Sekulovski, Dragan ^{[1
]}

机构：

[1] Philips Res Europe, NL-5656 AE Eindhoven, Netherlands

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2010年 / 12卷 / 01期

关键词：

Content analysis and synthesis; feature extraction and representation; joint media and multimodal processing;

D O I：

10.1109/TMM.2009.2036285

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Digital video capturing is getting popular with the decreasing price of camcorders and the increasing availability of devices with embedded video cameras such as digital-still cameras, mobile phones and PDAs. While a raw home video is considered as visually non-appealing, having multiple recordings of the same event provides the opportunity to combine audio and video segments from different cameras for improving quality and aesthetics. Mixing content from different recordings requires precise synchronization among the recordings. In most present applications, synchronization is done manually and considered as a very tedious task. In this paper, we propose a novel automated synchronization approach based on detecting and matching audio and video features extracted from the recorded content. We assess experimentally three realizations of this approach on a common data set and make recommendations on the usability of the different realizations in practical use cases. The realizations have no limitations on the number and movement of the cameras. Moreover, they are robust against various ambient noises and audio-visual artifacts occurring during the recordings.

引用

页码：79 / 92

页数：14

共 50 条

[31] Audio-Visual Speech Synchronization Detection Using a Bimodal Linear Prediction Model
Kumar, Kshitiz
Navratil, Jiri
Marcheret, Etienne
Libal, Vit
Ramaswamy, Ganesh
Potamianos, Gerasimos
[J]. 2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 670 - +
[32] Audio-visual speaker identification based on the use of dynamic audio and visual features
Fox, N
Reilly, RB
[J]. AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
[33] Hierarchical discriminant features for audio-visual LVCSR
Potamianos, G
Luettin, J
Neti, C
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 165 - 168
[34] Multimodal tracking and classification of audio-visual features
Pavlovic, V
[J]. 1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 343 - 347
[35] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
Tamura, Satoshi
Ishikawa, Masato
Hashiba, Takashi
Takeuchi, Shin'ichi
Hayamizu, Satoru
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
[36] AUDIO-VISUAL VOICE CONVERSION USING NOISE-ROBUST FEATURES
Sawada, Kohei
Takehara, Masanori
Tamura, Satoshi
Hayamizu, Satoru
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[37] Audio-Visual Beamforming with the Eigenmike Microphone Array an Omni-Camera and Cognitive Auditory Features
Mendat, Daniel R.
West, James E.
Ramenahalli, Sudarshan
Niebur, Ernst
Andreou, Andreas G.
[J]. 2017 51ST ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2017,
[38] Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features
Petar S. Aleksic
Jay J. Williams
Zhilin Wu
Aggelos K. Katsaggelos
[J]. EURASIP Journal on Advances in Signal Processing, 2002
[39] Audio-Visual Person Authentication with Multiple Visualized-Speech Features and Multiple Face Profiles
Das, Amitava
Manyam, Ohil K.
Tapaswi, Makarand
[J]. SIXTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS & IMAGE PROCESSING ICVGIP 2008, 2008, : 39 - 46
[40] Audio-Visual Detection of Multiple Chirping Robots
Gribovskiy, Alexey
Mondada, Francesco
[J]. IAS-10: INTELLIGENT AUTONOMOUS SYSTEMS 10, 2008, : 324 - 331

← 1 2 3 4 5 →