The Effects of Automatic Speech Recognition Quality on Human Transcription Latency

被引:16
|
作者
Gaur, Yashesh [1 ]
Lasecki, Walter S. [2 ]
Metze, Florian [1 ]
Bigham, Jeffrey P. [3 ,4 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI 48109 USA
[3] Carnegie Mellon Univ, HCI Inst, Pittsburgh, PA 15213 USA
[4] Carnegie Mellon Univ, LT Inst, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Captioning; Human Computation; Automatic Speech Recognition; Crowd Programming;
D O I
10.1145/2899475.2899478
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transcription makes speech accessible to deaf and hard of hearing people. This conversion of speech to text is still done manually by humans, despite high cost, because the quality of automated speech recognition (ASR) is still too low in real-world settings. Manual conversion can require more than 5 times the original audio time, which also introduces significant latency. Giving transcriptionists ASR output as a starting point seems like a reasonable approach to making humans more efficient and thereby reducing this cost, but the effectiveness of this approach is clearly related to the quality of the speech recognition output. At high error rates, fixing inaccurate speech recognition output may take longer than producing the transcription from scratch, and transcriptionists may not realize when transcription output is too inaccurate to be useful. In this paper, we empirically explore how the latency of transcriptions created by participants recruited on Amazon Mechanical Turk vary based on the accuracy of speech recognition output. We present results from 2 studies which indicate that starting with the ASR output is worse unless it is sufficiently accurate (Word Error Rate of under 30%).
引用
收藏
页数:8
相关论文
共 50 条
  • [1] The Effects of Automatic Speech Recognition Quality on Human Transcription Latency
    Gaur, Yashesh
    [J]. ASSETS'15: PROCEEDINGS OF THE 17TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS & ACCESSIBILITY, 2015, : 367 - 368
  • [2] Improving Readability for Automatic Speech Recognition Transcription
    Liao, Junwei
    Eskimez, Sefik
    Lu, Liyang
    Shi, Yu
    Gong, Ming
    Shou, Linjun
    Qu, Hong
    Zeng, Michael
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (05)
  • [3] UTILIZATION OF REDUNDANCY OF PHONEMIC TRANSCRIPTION OF SPEECH FOR AUTOMATIC-SPEECH RECOGNITION
    OTTEN, KW
    KLEINER, RT
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1964, 36 (05): : 1039 - &
  • [4] Automatic speech recognition performance on a voicemail transcription task
    Padmanabhan, M
    Saon, G
    Huang, J
    Kingsbury, B
    Mangu, L
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 433 - 442
  • [5] Improving the Quality of Automatic Speech Recognition in Trucks
    Korenevsky, Maxim
    Medennikov, Ivan
    Shchemelinin, Vadim
    [J]. Speech and Computer, 2016, 9811 : 362 - 369
  • [6] Low-latency transformer model for streaming automatic speech recognition
    Miao, Haoran
    Cheng, Gaofeng
    Zhang, Pengyuan
    [J]. ELECTRONICS LETTERS, 2022, 58 (01) : 44 - 46
  • [7] Recognition quality improvement in Automatic Speech Recognition system for Polish
    Wydra, Sebastian
    [J]. EUROCON 2007: THE INTERNATIONAL CONFERENCE ON COMPUTER AS A TOOL, VOLS 1-6, 2007, : 1693 - 1698
  • [8] A simple error classification system for understanding sources of error in automatic speech recognition and human transcription
    Zafar, A
    Mamlin, B
    Perkins, S
    Belsito, AM
    Overhage, JM
    McDonald, CJ
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2004, 73 (9-10) : 719 - 730
  • [9] Phoneme Confusions in Human and Automatic Speech Recognition
    Meyer, Bernd T.
    Waechter, Matthias
    Brand, Thomas
    Kollmeier, Birger
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2740 - 2743
  • [10] Acoustic quality normalization for robust automatic speech recognition
    Muhammad G.
    [J]. International Journal of Speech Technology, 2007, 10 (4) : 175 - 182