End-to-end speech recognition modeling from de-identified data

被引:1
|
作者
Flechl, Martin [1 ]
Yin, Shou-Chun [1 ]
Park, Junho [1 ]
Skala, Peter [1 ]
机构
[1] Nuance Commun Inc, Burlington, MA 01803 USA
来源
关键词
speech recognition; ASR; end-to-end; de-identification; privacy; conformer; transducer; text-to-speech;
D O I
10.21437/Interspeech.2022-10484
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
De-identification of data used for automatic speech recognition modeling is a critical component in protecting privacy, especially in the medical domain. However, simply removing all personally identifiable information (PII) from end-to-end model training data leads to a significant performance degradation in particular for the recognition of names, dates, locations, and words from similar categories. We propose and evaluate a two-step method for partially recovering this loss. First, PII is identified, and each occurrence is replaced with a random word sequence of the same category. Then, corresponding audio is produced via text-to-speech or by splicing together matching audio fragments extracted from the corpus. These artificial audio/label pairs, together with speaker turns from the original data without PII, are used to train models. We evaluate the performance of this method on in-house data of medical conversations and observe a recovery of almost the entire performance degradation in the general word error rate while still maintaining a strong diarization performance. Our main focus is the improvement of recall and precision in the recognition of PII-related words. Depending on the PII category, between 50%-90% of the performance degradation can be recovered using our proposed method.
引用
收藏
页码:1382 / 1386
页数:5
相关论文
共 50 条
  • [1] Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition
    Zhou, Wei
    Zeineldeen, Mohammad
    Zheng, Zuoyun
    Schlueter, Ralf
    Ney, Hermann
    [J]. INTERSPEECH 2021, 2021, : 2886 - 2890
  • [2] IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION
    Li, Jinyu
    Zhao, Rui
    Hu, Hu
    Gong, Yifan
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 114 - 121
  • [3] Multitask Training with Text Data for End-to-End Speech Recognition
    Wang, Peidong
    Sainath, Tara N.
    Weiss, Ron J.
    [J]. INTERSPEECH 2021, 2021, : 2566 - 2570
  • [4] Data Augmentation for End-to-end Silent Speech Recognition for Laryngectomees
    Cao, Beiming
    Teplansky, Kristin
    Sebkhi, Nordine
    Bhaysar, Arpan
    Inan, Omer T.
    Samlan, Robin
    Mau, Ted
    Wang, Jun
    [J]. INTERSPEECH 2022, 2022, : 3653 - 3657
  • [5] Semantic Data Augmentation for End-to-End Mandarin Speech Recognition
    Sun, Jianwei
    Tang, Zhiyuan
    Yin, Hengxin
    Wang, Wei
    Zhao, Xi
    Zhao, Shuaijiang
    Lei, Xiaoning
    Zou, Wei
    Li, Xiangang
    [J]. INTERSPEECH 2021, 2021, : 1269 - 1273
  • [6] End-to-End Speech Recognition From the Raw Waveform
    Zeghidour, Neil
    Usunier, Nicolas
    Synnaeve, Gabriel
    Collobert, Ronan
    Dupoux, Emmanuel
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 781 - 785
  • [7] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [8] End-to-End Speech Recognition in Russian
    Markovnikov, Nikita
    Kipyatkova, Irina
    Lyakso, Elena
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
  • [9] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [10] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    [J]. INTERSPEECH 2019, 2019, : 2140 - 2144