Indonesian Audio-Visual Speech Corpus for Multimodal Automatic Speech Recognition

被引:0
|
作者
Maulana, Muhammad Rizki Aulia Rahman [1 ]
Fanany, Mohamad Ivan [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Kampus UI Depok, Jawa Barat 16424, Indonesia
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Advancement of Automatic Speech Recognition (ASR) relies heavily on the availability of the data, even more so for deep learning ASR system which is at the forefront of ASR research. A multitude of such corpus has been built to accommodate such need, ranging from single modal corpus which caters the need for mostly acoustic speech recognition, with several exceptions on visual speech decoding, to multimodal corpus which provides the need for both modalities. Multimodal corpus was significant in the development of ASR as speech is inherently multimodal in the very first place. Despite the im-portance, none of this corpus was built for Indonesian language, resulting in little to no development of visual-only or multimodal ASR systems. This research is an attempt to solve that problem by constructing AVID, an Indonesian audio-visual speech corpus for multimodal ASR. The corpus consists of 10 speakers speaking 1,040 sentences with a simple structure, resulting in 10,400 videos of spoken sentences. To the best of our knowledge, AVID is the first audio-visual speech corpus for the Indonesian language which is designed for multimodal ASR. AVID was heavily tested and contains overall low errors in both modality tests, which indicates the high quality and suitability of the corpus for building multimodal ASR systems.
引用
收藏
页码:381 / 385
页数:5
相关论文
共 50 条
  • [1] An audio-visual corpus for multimodal automatic speech recognition
    Andrzej Czyzewski
    Bozena Kostek
    Piotr Bratoszewski
    Jozef Kotus
    Marcin Szykulski
    [J]. Journal of Intelligent Information Systems, 2017, 49 : 167 - 192
  • [2] An audio-visual corpus for multimodal automatic speech recognition
    Czyzewski, Andrzej
    Kostek, Bozena
    Bratoszewski, Piotr
    Kotus, Jozef
    Szykulski, Marcin
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (02) : 167 - 192
  • [3] An audio-visual corpus for speech perception and automatic speech recognition (L)
    Cooke, Martin
    Barker, Jon
    Cunningham, Stuart
    Shao, Xu
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05): : 2421 - 2424
  • [4] Multimodal Corpus Design for Audio-Visual Speech Recognition in Vehicle Cabin
    Kashevnik, Alexey
    Lashkov, Igor
    Axyonov, Alexandr
    Ivanko, Denis
    Ryumin, Dmitry
    Kolchin, Artem
    Karpov, Alexey
    [J]. IEEE ACCESS, 2021, 9 : 34986 - 35003
  • [5] Building a data corpus for audio-visual speech recognition
    Chitu, Alin G.
    Rothkrantz, Leon J. M.
    [J]. EUROMEDIA '2007, 2007, : 88 - 92
  • [6] DEEP MULTIMODAL LEARNING FOR AUDIO-VISUAL SPEECH RECOGNITION
    Mroueh, Youssef
    Marcheret, Etienne
    Goel, Vaibhava
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2130 - 2134
  • [7] Audio-Visual Automatic Speech Recognition for Connected Digits
    Wang, Xiaoping
    Hao, Yufeng
    Fu, Degang
    Yuan, Chunwei
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +
  • [8] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [9] Multimodal Sparse Transformer Network for Audio-Visual Speech Recognition
    Song, Qiya
    Sun, Bin
    Li, Shutao
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10028 - 10038
  • [10] Automatic Visual Feature Extraction for Mandarin Audio-Visual Speech Recognition
    Pao, Tsang-Long
    Liao, Wen-Yuan
    Wu, Tsan-Nung
    Lin, Ching-Yi
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 2936 - 2940