AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception

被引:0
|
作者
Varano, Enrico [1 ]
Guilleminot, Pierre [1 ]
Reichenbach, Tobias [2 ]
机构
[1] Imperial Coll London, Ctr Neurotechnol, Dept Bioengn, London, England
[2] Friedrich Alexander Univ Erlangen Nurnberg, Dept Artificial Intelligence Biomed Engn, Erlangen, Germany
来源
基金
英国工程与自然科学研究理事会;
关键词
Audio recordings - Speech recognition;
D O I
10.1121/10.0019460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Seeing a speaker's face can help substantially with understanding their speech, particularly in challenging listening conditions. Research into the neurobiological mechanisms behind audiovisual integration has recently begun to employ continuous natural speech. However, these efforts are impeded by a lack of high-quality audiovisual recordings of a speaker narrating a longer text. Here, we seek to close this gap by developing AVbook, an audiovisual speech corpus designed for cognitive neuroscience studies and audiovisual speech recognition. The corpus consists of 3.6 h of audiovisual recordings of two speakers, one male and one female, each reading 59 passages from a narrative English text. The recordings were acquired at a high frame rate of 119.88 frames/s. The corpus includes phone-level alignment files and a set of multiple-choice questions to test attention to the different passages. We verified the efficacy of these questions in a pilot study. A short written summary is also provided for each recording. To enable audiovisual synchronization when presenting the stimuli, four videos of an electronic clapperboard were recorded with the corpus. The corpus is publicly available to support research into the neurobiology of audiovisual speech processing as well as the development of computer algorithms for audiovisual speech recognition.
引用
收藏
页码:3130 / 3137
页数:8
相关论文
共 20 条
  • [1] Evaluating the influence of frame rate on the temporal aspects of audiovisual speech perception
    Vatakis, Argiro
    Spence, Charles
    NEUROSCIENCE LETTERS, 2006, 405 (1-2) : 132 - 136
  • [2] Some consideration on expressive audiovisual speech corpus acquisition using a multimodal platform
    Dahmani, Sara
    Colotte, Vincent
    Ouni, Slim
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (04) : 943 - 974
  • [3] Some consideration on expressive audiovisual speech corpus acquisition using a multimodal platform
    Sara Dahmani
    Vincent Colotte
    Slim Ouni
    Language Resources and Evaluation, 2020, 54 : 943 - 974
  • [4] High-Frame-Rate Full-Vocal-Tract 3D Dynamic Speech Imaging
    Fu, Maojing
    Barlaz, Marissa S.
    Holtrop, Joseph L.
    Perry, Jamie L.
    Kuehn, David P.
    Shosted, Ryan K.
    Liang, Zhi-Pei
    Sutton, Bradley P.
    MAGNETIC RESONANCE IN MEDICINE, 2017, 77 (04) : 1619 - 1629
  • [5] Neural correlates of multisensory enhancement in audiovisual narrative speech perception: A fMRI investigation
    Ross, Lars A.
    Molholm, Sophie
    Butler, John S.
    Del Bene, Victor A.
    Foxe, John J.
    NEUROIMAGE, 2022, 263
  • [6] Frame rate as a QoS parameter and its influence on speech perception
    Kaoru Nakazono
    Multimedia Systems, 1998, 6 : 359 - 366
  • [7] Frame rate of motion picture and its influence on speech perception
    Nakazono, K
    MULTIMEDIA COMPUTING AND NETWORKING 1996, 1996, 2667 : 211 - 220
  • [8] Frame rate as a QoS parameter and its influence on speech perception
    Nakazono, K
    MULTIMEDIA SYSTEMS, 1998, 6 (05) : 359 - 366
  • [9] High visual resolution matters in audiovisual speech perception, but only for some
    Alsius, Agnes
    Wayne, Rachel V.
    Pare, Martin
    Munhall, Kevin G.
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2016, 78 (05) : 1472 - 1487
  • [10] High visual resolution matters in audiovisual speech perception, but only for some
    Agnès Alsius
    Rachel V. Wayne
    Martin Paré
    Kevin G. Munhall
    Attention, Perception, & Psychophysics, 2016, 78 : 1472 - 1487