MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information

被引:2
|
作者
Wang, Jianrong [1 ]
Huo, Yuchen [2 ]
Liu, Li [3 ]
Xu, Tianyi [1 ]
Li, Qi [4 ]
Li, Sen [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[2] Tianjin Univ, Tianjin Int Engn Inst, Tianjin, Peoples R China
[3] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China
[4] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Audio-Visual Speech Recognition; Mandarin Audio-Visual Corpus; Azure Kinect; Depth Information; SPEECH; RECOGNITION; TECHNOLOGY;
D O I
10.21437/Interspeech.2023-823
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Audio-visual speech recognition (AVSR) gains increasing attention from researchers as an important part of human-computer interaction. However, the existing available Mandarin audio-visual datasets are limited and lack the depth information. To address this issue, this work establishes the MAVD, a new large-scale Mandarin multimodal corpus comprising 12,484 utterances spoken by 64 native Chinese speakers. To ensure the dataset covers diverse real-world scenarios, a pipeline for cleaning and filtering the raw text material has been developed to create a well-balanced reading material. In particular, the latest data acquisition device of Microsoft, Azure Kinect is used to capture depth information in addition to the traditional audio signals and RGB images during data acquisition. We also provide a baseline experiment, which could be used to evaluate the effectiveness of the dataset. The dataset and code will be released at https://github.com/SpringHuo/MAVD.
引用
收藏
页码:2113 / 2117
页数:5
相关论文
共 50 条
  • [41] Is Second-order Information Helpful for Large-scale Visual Recognition?
    Li, Peihua
    Xie, Jiangtao
    Wang, Qilong
    Zuo, Wangmeng
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2089 - 2097
  • [42] AUDIO-VISUAL SPEECH ACTIVITY DETECTION IN A TWO-SPEAKER SCENARIO INCORPORATING DEPTH INFORMATION FROM A PROFILE OR FRONTAL VIEW
    Thermos, Spyridon
    Potamianos, Gerasimos
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 579 - 584
  • [43] VisRepo: A Visual Retrieval Tool for Large-Scale Open-Source Projects
    Yue, Xiaoqi
    Liu, Chao
    Zhang, Neng
    Hu, Haibo
    Zhang, Xiaohong
    PROCEEDINGS OF THE 15TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2024, 2024, : 499 - 502
  • [44] CKM: A Shared Visual Analytical Tool for Large-Scale Analysis of Audio-Video Interviews
    Xiao, Lu
    Luo, Yan
    High, Steven
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [45] Name-Face Association in Web Videos: A Large-Scale Dataset,Baselines, and Open Issues
    陈智能
    杨宗桦
    张炜
    曹娟
    姜育刚
    Journal of Computer Science & Technology, 2014, 29 (05) : 785 - 798
  • [46] Large-Scale Indoor Visual-Geometric Multimodal Dataset and Benchmark for Novel View Synthesis
    Cao, Junming
    Zhao, Xiting
    Schwertfeger, Soren
    SENSORS, 2024, 24 (17)
  • [47] Plant Disease Recognition: A Large-Scale Benchmark Dataset and a Visual Region and Loss Reweighting Approach
    Liu, Xinda
    Min, Weiqing
    Mei, Shuhuan
    Wang, Lili
    Jiang, Shuqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2003 - 2015
  • [48] LUCFER: A Large-Scale Context-Sensitive Image Dataset for Deep Learning of Visual Emotions
    Balouchian, Pooyan
    Safaei, Marjaneh
    Foroosh, Hassan
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1645 - 1654
  • [49] PediCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children
    Hieu H. Pham
    Ngoc H. Nguyen
    Thanh T. Tran
    Tuan N. M. Nguyen
    Ha Q. Nguyen
    Scientific Data, 10
  • [50] PediCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children
    Pham, Hieu H.
    Nguyen, Ngoc H.
    Tran, Thanh T.
    Nguyen, Tuan N. M.
    Nguyen, Ha Q.
    SCIENTIFIC DATA, 2023, 10 (01)