Cost-Effective Solution to Synchronized Audio-Visual Capture using Multiple Sensors

被引：10

作者：

Lichtenauer, Jeroen ^{[1
]}

Valstar, Michel ^{[1
]}

Shen, Jie ^{[1
]}

Pantic, Maja ^{[1
]}

机构：

[1] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2AZ, England

来源：

AVSS: 2009 6TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE | 2009年

关键词：

Video recording; Audio recording; Multisensor systems; Synchronization;

D O I：

10.1109/AVSS.2009.92

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Applications such as surveillance and human motion capture require high-bandwidth recording from multiple cameras. Furthermore, the recent increase in research on sensor fusion has raised the demand on synchronization accuracy between video, audio and other sensor modalities. Previously, capturing synchronized, high resolution video from multiple cameras required complex, inflexible and expensive solutions. Our experiments show that a single PC, built from contemporary low-cost computer hardware, could currently handle up to 470MB/s of input data. This allows capturing from 18 cameras of 780x580pixels at 60fps each, or 36 cameras at 30fps. Furthermore, we achieve accurate synchronization between audio, video and additional sensors, by recording audio together with sensor trigger- or timestamp signals, using a multi-channel audio input. In this way, each sensor modality can be captured with separate software and hardware, allowing maximal flexibility with minimal cost.

引用

页码：324 / 329

页数：6

共 50 条

[31] Onmidirectional audio-visual talker localization based on dynamic fusion of audio-visual features using validity and reliability criteria
Denda, Yuki
Nishiura, Takanobu
Yamashita, Yoichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03): : 598 - 606
[32] Bird Species Classification with Audio-Visual Data using CNN and Multiple Kernel Learning
Bold, Naranchimeg
Zhang, Chao
Akashi, Takuya
2019 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2019, : 85 - 88
[33] Multiple video clips preservation using folded back audio-visual cryptography scheme
Mukherjee, Imon
Ganguly, Ritam
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (05) : 5281 - 5301
[34] A cost-effective solution for blog search
Chen, Lin-Chih
DATA & KNOWLEDGE ENGINEERING, 2018, 116 : 124 - 137
[35] Multiple camera in car audio-visual speech recognition using phonetic and visemic information
Biswas, Astik
Sahu, P. K.
Chandra, Mahesh
COMPUTERS & ELECTRICAL ENGINEERING, 2015, 47 : 35 - 50
[36] An efficient, cost-effective and simple solution
不详
BRITISH DENTAL JOURNAL, 2019, 227 (12) : 1073 - 1073
[37] NEUROCOMPUTER - CRASY A COST-EFFECTIVE SOLUTION
GUERIN, A
JUTTEN, C
HERAULT, J
NEURAL NETWORKS FROM MODELS TO APPLICATIONS, 1989, : 756 - 765
[38] Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition
Chen, Hang
Wang, Qing
Du, Jun
Yin, Bao-Cai
Pan, Jia
Lee, Chin-Hui
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2508 - 2521
[39] An efficient, cost-effective and simple solution
British Dental Journal, 2019, 227 : 1073 - 1073
[40] Multiple cameras for audio-visual speech recognition in an automotive environment
Navarathna, Rajitha
Dean, David
Sridharan, Sridha
Lucey, Patrick
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (04): : 911 - 927

← 1 2 3 4 5 →