Data Augmentation for End-to-end Silent Speech Recognition for Laryngectomees

被引:2
|
作者
Cao, Beiming [1 ,2 ]
Teplansky, Kristin [2 ]
Sebkhi, Nordine [3 ]
Bhaysar, Arpan [3 ]
Inan, Omer T. [3 ]
Samlan, Robin [4 ]
Mau, Ted [5 ]
Wang, Jun [2 ,6 ]
机构
[1] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
[2] Univ Texas Austin, Dept Speech Language & Hearing Sci, Austin, TX 78712 USA
[3] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
[4] Univ Arizona, Dept Speech Language & Hearing Sci, Tucson, AZ 85721 USA
[5] Univ Texas Southwestern Med Ctr Dallas, Dept Otolaryngol, Dallas, TX 75390 USA
[6] Univ Texas Austin, Dell Med Sch, Dept Neurol, Austin, TX 78712 USA
来源
基金
美国国家卫生研究院;
关键词
silent speech recognition; silent speech interface; data augmentation; alaryngeal speech; COMMUNICATION; MODEL;
D O I
10.21437/Interspeech.2022-10868
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Silent speech recognition (SSR) predicts textual information from silent articulation, which is an algorithm design in silent speech interfaces (SSIs). SSIs have the potential of recovering the speech ability of individuals who lost their voice but can still articulate (e.g., laryngectomees). Due to the logistic difficulties in articulatory data collection, current SSR studies suffer limited amount of dataset. Data augmentation aims to increase the training data amount by introducing variations into the existing dataset, but has rarely been investigated in SSR for laryngectomees. In this study, we investigated the effectiveness of multiple data augmentation approaches for SSR including consecutive and intermittent time masking, articulatory dimension masking, sinusoidal noise injection and randomly scaling. Different experimental setups including speaker-dependent, speaker-independent, and speaker-adaptive were used. The SSR models were end-to-end speech recognition models trained with connectionist temporal classification (CTC). Electromagnetic articulography (EMA) datasets collected from multiple healthy speakers and laryngectomees were used. The experimental results have demonstrated that the data augmentation approaches explored performed differently, but generally improved SSR performance. Especially, the consecutive time masking has brought significant improvement on SSR for both healthy speakers and laryngectomees.
引用
收藏
页码:3653 / 3657
页数:5
相关论文
共 50 条
  • [1] Semantic Data Augmentation for End-to-End Mandarin Speech Recognition
    Sun, Jianwei
    Tang, Zhiyuan
    Yin, Hengxin
    Wang, Wei
    Zhao, Xi
    Zhao, Shuaijiang
    Lei, Xiaoning
    Zou, Wei
    Li, Xiangang
    [J]. INTERSPEECH 2021, 2021, : 1269 - 1273
  • [2] DATA AUGMENTATION FOR END-TO-END CODE-SWITCHING SPEECH RECOGNITION
    Du, Chenpeng
    Li, Hao
    Lu, Yizhou
    Wang, Lan
    Qian, Yanmin
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 194 - 200
  • [3] SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition
    Song, Xingchen
    Wu, Zhiyong
    Huang, Yiheng
    Su, Dan
    Meng, Helen
    [J]. INTERSPEECH 2020, 2020, : 581 - 585
  • [4] END-TO-END SILENT SPEECH RECOGNITION WITH ACOUSTIC SENSING
    Luo, Jian
    Wang, Jianzong
    Cheng, Ning
    Jiang, Guilin
    Xiao, Jing
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 606 - 612
  • [5] AUDITORY-BASED DATA AUGMENTATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Tu, Zehai
    Deadman, Jack
    Ma, Ning
    Barker, Jon
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7447 - 7451
  • [6] CONVOLUTIONAL DROPOUT AND WORDPIECE AUGMENTATION FOR END-TO-END SPEECH RECOGNITION
    Xu, Hainan
    Huang, Yinghui
    Zhu, Yun
    Audhkhasi, Kartik
    Ramabhadran, Bhuvana
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5984 - 5988
  • [7] End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge
    Kimura, Naoki
    Su, Zixiong
    Saeki, Takaaki
    [J]. INTERSPEECH 2020, 2020, : 1025 - 1026
  • [8] Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios
    Tsunoo, Emiru
    Shibata, Kentaro
    Narisetty, Chaitanya
    Kashiwagi, Yosuke
    Watanabe, Shinji
    [J]. INTERSPEECH 2021, 2021, : 301 - 305
  • [9] STARGAN FOR EMOTIONAL SPEECH CONVERSION: VALIDATED BY DATA AUGMENTATION OF END-TO-END EMOTION RECOGNITION
    Rizos, Georgios
    Baird, Alice
    Elliott, Max
    Schuller, Bjorn
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3502 - 3506
  • [10] Data Augmentation for End-to-End Optical Music Recognition
    Lopez-Gutierrez, Juan C.
    Valero-Mas, Jose J.
    Castellanos, Francisco J.
    Calvo-Zaragoza, Jorge
    [J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 59 - 73