Speech recognition for medical conversations

被引:0
|
作者
Chiu, Chung-Cheng [1 ]
Tripathi, Anshuman
Chou, Katherine
Co, Chris
Jaitly, Navdeep
Jaunzeikare, Diana
Kannan, Anjuli
Nguyen, Patrick
Sak, Hasim
Sankar, Ananth [1 ,2 ]
Tansuwan, Justin
Wan, Nathan [1 ]
Wu, Yonghui
Zhang, Xuedong [1 ]
机构
[1] Google, Mountain View, CA USA
[2] LinkedIn, Mountain View, CA USA
关键词
medical transcription; conversational transcription; end-to-end attention models; CTC;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we document our experiences with developing speech recognition for medical transcription a system that automatically transcribes doctor-patient conversations. Towards this goal, we built a system along two different methodological lines a Connectionist Temporal Classification (CTC) phoneme based model and a Listen Attend and Spell (LAS) grapheme based model. To train these models we used a corpus of anonymized conversations representing approximately 14,000 hours of speech. Because of noisy transcripts and alignments in the corpus, a significant amount of effort was invested in data cleaning issues. We describe a two-stage strategy we followed for segmenting the data. The data cleanup and development of a matched language model was essential to the success of the CTC based models. The LAS based models, however were found to be resilient to alignment and transcript noise and did not require the use of language models. CTC models were able to achieve a word error rate of 20.1%, and the LAS models were able to achieve 18.3%. Our analysis shows that both models perform well on important medical utterances and therefore can be practical for transcribing medical conversations.
引用
收藏
页码:2972 / 2976
页数:5
相关论文
共 50 条
  • [1] Speech Recognition for Medical Conversations Health Record (MCHR)
    Singh, Nivedita
    Balasubramaniam, M.
    Singh, Jitendra
    PROCEEDINGS OF EMERGING TRENDS AND TECHNOLOGIES ON INTELLIGENT SYSTEMS (ETTIS 2021), 2022, 1371 : 191 - 201
  • [2] Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
    Wu, Wen
    Zhang, Chao
    Woodland, Philip C.
    INTERSPEECH 2023, 2023, : 3607 - 3611
  • [3] SPEECH RECOGNITION ROBUST AGAINST SPEECH OVERLAPPING IN MONAURAL RECORDINGS OF TELEPHONE CONVERSATIONS
    Suzuki, Masayuki
    Kurata, Gakuto
    Nagano, Tohru
    Tachibana, Ryuki
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5685 - 5689
  • [4] Real-time Speech Summarization for Medical Conversations
    Khai Le-Duc
    Khai-Nguyen Nguyen
    Long Vo-Dang
    Truong-Son Hy
    INTERSPEECH 2024, 2024, : 1960 - 1964
  • [5] Speech Recognition and Multi-Speaker Diarization of Long Conversations
    Mao, Huanru Henry
    Li, Shuyang
    McAuley, Julian
    Cottrell, Garrison W.
    INTERSPEECH 2020, 2020, : 691 - 695
  • [6] Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations
    Hsu, Jia-Hao
    Su, Ming-Hsiang
    Wu, Chung-Hsien
    Chen, Yi-Hsuan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1675 - 1686
  • [7] AUTOMATED SPEECH RECOGNITION IN MEDICAL APPLICATIONS
    GRASSO, MA
    M D COMPUTING, 1995, 12 (01): : 16 - &
  • [8] TOWARDS MEASURING FAIRNESS IN SPEECH RECOGNITION: CASUAL CONVERSATIONS DATASET TRANSCRIPTIONS
    Liu, Chunxi
    Picheny, Michael
    Sari, Leda
    Chitkara, Pooja
    Xiao, Alex
    Zhang, Xiaohui
    Chou, Mark
    Alvarado, Andres
    Hazirbas, Caner
    Saraf, Yatharth
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6162 - 6166
  • [9] Medical Speech Recognition: Reaching Parity with Humans
    Edwards, Erik
    Salloum, Wael
    Finley, Greg P.
    Fone, James
    Cardiff, Greg
    Miller, Mark
    Suendermann-Oeft, David
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 512 - 524
  • [10] Requirements for speech recognition to support medical documentation
    Mönnich, G
    Wetter, T
    METHODS OF INFORMATION IN MEDICINE, 2000, 39 (01) : 63 - 69