Speech recognition for medical conversations

被引:0
|
作者
Chiu, Chung-Cheng [1 ]
Tripathi, Anshuman
Chou, Katherine
Co, Chris
Jaitly, Navdeep
Jaunzeikare, Diana
Kannan, Anjuli
Nguyen, Patrick
Sak, Hasim
Sankar, Ananth [1 ,2 ]
Tansuwan, Justin
Wan, Nathan [1 ]
Wu, Yonghui
Zhang, Xuedong [1 ]
机构
[1] Google, Mountain View, CA USA
[2] LinkedIn, Mountain View, CA USA
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
medical transcription; conversational transcription; end-to-end attention models; CTC;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we document our experiences with developing speech recognition for medical transcription a system that automatically transcribes doctor-patient conversations. Towards this goal, we built a system along two different methodological lines a Connectionist Temporal Classification (CTC) phoneme based model and a Listen Attend and Spell (LAS) grapheme based model. To train these models we used a corpus of anonymized conversations representing approximately 14,000 hours of speech. Because of noisy transcripts and alignments in the corpus, a significant amount of effort was invested in data cleaning issues. We describe a two-stage strategy we followed for segmenting the data. The data cleanup and development of a matched language model was essential to the success of the CTC based models. The LAS based models, however were found to be resilient to alignment and transcript noise and did not require the use of language models. CTC models were able to achieve a word error rate of 20.1%, and the LAS models were able to achieve 18.3%. Our analysis shows that both models perform well on important medical utterances and therefore can be practical for transcribing medical conversations.
引用
收藏
页码:2972 / 2976
页数:5
相关论文
共 50 条
  • [41] MEDIATION OF ATTITUDES BY SPEECH IN DYADIC CONVERSATIONS
    HJELMQUIST, E
    BRENNER, SO
    HUMAN RELATIONS, 1979, 32 (12) : 983 - 997
  • [42] An online intelligent electronic medical record system via speech recognition
    Xia, Xin
    Ma, Yunlong
    Luo, Ye
    Lu, Jianwei
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2022, 18 (11):
  • [43] Developing Medical Virtual Teaching Assistant Based on Speech Recognition Technology
    Purbohadi, Dwijoko
    Afriani, Silvia
    Rachmanio, Nicko
    Dewi, Arlina
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2021, 17 (04) : 107 - 120
  • [44] An Analysis of Speech as a Modality for Activity Recognition during Complex Medical Teamwork
    Jagannath, Swathi
    Sarcevic, Aleksandra
    Marsic, Ivan
    PROCEEDINGS OF THE 12TH EAI INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING TECHNOLOGIES FOR HEALTHCARE (PERVASIVEHEALTH 2018), 2018, : 88 - 97
  • [45] Speech Emotion Recognition Applied to Real-World Medical Consultation
    Huang, Ching-Tzu
    Huang, Chih-Wei
    Yang, Hsuan-Chia
    Li, Yu-Chuan
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 1121 - 1125
  • [46] Dictating a report to control the interaction The use of speech recognition in medical consultations
    El Haik-Wagner, Nicolas
    M S-MEDECINE SCIENCES, 2022, 38 (10): : 827 - 831
  • [47] An online intelligent electronic medical record system via speech recognition
    Xia, Xin
    Ma, Yunlong
    Luo, Ye
    Lu, Jianwei
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2022, 18 (11)
  • [48] Creation of Language Resources for the Development of a Medical Speech Recognition System for Latvian
    Dargis, Roberts
    Gruzitis, Normunds
    Auzina, Ilze
    Stepanovs, Kaspars
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE (HLT 2020), 2020, 328 : 135 - 141
  • [49] Prototyping Virtual Reality Interactions in Medical Simulation Employing Speech Recognition
    Yang, Jacky
    Chan, Michael
    Uribe-Quevedo, Alvaro
    Kapralos, Bill
    Jaimes, Norman
    Dubrowski, Adam
    2020 22ND SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY (SVR 2020), 2020, : 351 - 355
  • [50] SPEECH RECOGNITION WITH NO SPEECH OR WITH NOISY SPEECH
    Krishna, Gautam
    Co Tran
    Yu, Jianguo
    Tewfik, Ahmed H.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1090 - 1094