AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception

被引：0

作者：

Varano, Enrico ^{[1
]}

Guilleminot, Pierre ^{[1
]}

Reichenbach, Tobias ^{[2
]}

机构：

[1] Imperial Coll London, Ctr Neurotechnol, Dept Bioengn, London, England

[2] Friedrich Alexander Univ Erlangen Nurnberg, Dept Artificial Intelligence Biomed Engn, Erlangen, Germany

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2023年 / 153卷 / 05期

基金：

英国工程与自然科学研究理事会;

关键词：

Audio recordings - Speech recognition;

D O I：

10.1121/10.0019460

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Seeing a speaker's face can help substantially with understanding their speech, particularly in challenging listening conditions. Research into the neurobiological mechanisms behind audiovisual integration has recently begun to employ continuous natural speech. However, these efforts are impeded by a lack of high-quality audiovisual recordings of a speaker narrating a longer text. Here, we seek to close this gap by developing AVbook, an audiovisual speech corpus designed for cognitive neuroscience studies and audiovisual speech recognition. The corpus consists of 3.6 h of audiovisual recordings of two speakers, one male and one female, each reading 59 passages from a narrative English text. The recordings were acquired at a high frame rate of 119.88 frames/s. The corpus includes phone-level alignment files and a set of multiple-choice questions to test attention to the different passages. We verified the efficacy of these questions in a pilot study. A short written summary is also provided for each recording. To enable audiovisual synchronization when presenting the stimuli, four videos of an electronic clapperboard were recorded with the corpus. The corpus is publicly available to support research into the neurobiology of audiovisual speech processing as well as the development of computer algorithms for audiovisual speech recognition.

引用

页码：3130 / 3137

页数：8

共 20 条

[1] Evaluating the influence of frame rate on the temporal aspects of audiovisual speech perception
Vatakis, Argiro
Spence, Charles
NEUROSCIENCE LETTERS, 2006, 405 (1-2) : 132 - 136
[2] Some consideration on expressive audiovisual speech corpus acquisition using a multimodal platform
Dahmani, Sara
Colotte, Vincent
Ouni, Slim
LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (04) : 943 - 974
[3] Some consideration on expressive audiovisual speech corpus acquisition using a multimodal platform
Sara Dahmani
Vincent Colotte
Slim Ouni
Language Resources and Evaluation, 2020, 54 : 943 - 974
[4] High-Frame-Rate Full-Vocal-Tract 3D Dynamic Speech Imaging
Fu, Maojing
Barlaz, Marissa S.
Holtrop, Joseph L.
Perry, Jamie L.
Kuehn, David P.
Shosted, Ryan K.
Liang, Zhi-Pei
Sutton, Bradley P.
MAGNETIC RESONANCE IN MEDICINE, 2017, 77 (04) : 1619 - 1629
[5] Neural correlates of multisensory enhancement in audiovisual narrative speech perception: A fMRI investigation
Ross, Lars A.
Molholm, Sophie
Butler, John S.
Del Bene, Victor A.
Foxe, John J.
NEUROIMAGE, 2022, 263
[6] Frame rate as a QoS parameter and its influence on speech perception
Kaoru Nakazono
Multimedia Systems, 1998, 6 : 359 - 366
[7] Frame rate of motion picture and its influence on speech perception
Nakazono, K
MULTIMEDIA COMPUTING AND NETWORKING 1996, 1996, 2667 : 211 - 220
[8] Frame rate as a QoS parameter and its influence on speech perception
Nakazono, K
MULTIMEDIA SYSTEMS, 1998, 6 (05) : 359 - 366
[9] High visual resolution matters in audiovisual speech perception, but only for some
Alsius, Agnes
Wayne, Rachel V.
Pare, Martin
Munhall, Kevin G.
ATTENTION PERCEPTION & PSYCHOPHYSICS, 2016, 78 (05) : 1472 - 1487
[10] High visual resolution matters in audiovisual speech perception, but only for some
Agnès Alsius
Rachel V. Wayne
Martin Paré
Kevin G. Munhall
Attention, Perception, & Psychophysics, 2016, 78 : 1472 - 1487

← 1 2 →