Improving Readability for Automatic Speech Recognition Transcription

被引:5
|
作者
Liao, Junwei [1 ]
Eskimez, Sefik [2 ]
Lu, Liyang [2 ]
Shi, Yu [2 ]
Gong, Ming [3 ]
Shou, Linjun [3 ]
Qu, Hong [1 ]
Zeng, Michael [2 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Microsoft Speech & Dialogue Res Grp, New York, NY USA
[3] Microsoft STCA NLP Grp, Beijing, Peoples R China
关键词
Automatic speech recognition; post-processing for readability; data synthesis; pre-trained model; PUNCTUATION; CAPITALIZATION;
D O I
10.1145/3557894
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other noises common in spoken communication. These readable issues introduced by speakers and ASR systems will impair the performance of downstream tasks and the understanding of human readers. In thiswork, we present a task called ASR post-processing for readability (APR) and formulate it as a sequenceto-sequence text generation problem. The APR task aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of speakers. We further study the APR task from the benchmark dataset, evaluation metrics, and baseline models: First, to address the lack of task-specific data, we propose a method to construct a dataset for the APR task by using the data collected for grammatical error correction. Second, we utilize metrics adapted or borrowed from similar tasks to evaluate model performance on the APR task. Lastly, we use several typical or adapted pre-trained models as the baseline models for the APR task. Furthermore, we fine-tune the baseline models on the constructed dataset and compare their performance with a traditional pipeline method in terms of proposed evaluation metrics. Experimental results show that all the fine-tuned baseline models perform better than the traditional pipeline method, and our adapted RoBERTa model outperforms the pipeline method by 4.95 and 6.63 BLEU points on two test sets, respectively. The human evaluation and case study further reveal the ability of the proposed model to improve the readability of ASR transcripts.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Evaluating and Improving Child-Directed Automatic Speech Recognition
    Booth, Eric
    Carns, Jake
    Kennington, Casey
    Rafla, Nader
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6340 - 6345
  • [22] IMPROVING ENTITY RECALL IN AUTOMATIC SPEECH RECOGNITION WITH NEURAL EMBEDDINGS
    Li, Christopher
    Rondon, Pat
    Caseiro, Diamantino
    Velikovich, Leonid
    Velez, Xavier
    Aleksic, Petar
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7353 - 7357
  • [23] Improving English pronunciation via automatic speech recognition technology
    Liu, Xiaobin
    Xu, Manfei
    Li, Meihui
    Han, Meiting
    Chen, Zejia
    Mo, Yiling
    Chen, Xiujuan
    Liu, Minjia
    INTERNATIONAL JOURNAL OF INNOVATION AND LEARNING, 2019, 25 (02) : 126 - 140
  • [24] Automatic Transcription and Speech Recognition of Romanian Corpus RO-GRID
    Giurgiu, Mircea
    Kabir, Ahsanul
    2012 35TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2012, : 465 - 468
  • [25] Algorithms for Automatic Accentuation and Transcription of Russian Texts in Speech Recognition Systems
    Yakovenko, Olga
    Bondarenko, Ivan
    Borovikova, Mariya
    Vodolazsky, Daniil
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 768 - 777
  • [26] Speech recognition and transcription
    Benton, C
    ACADEMIC RADIOLOGY, 2001, 8 (05) : 427 - 429
  • [27] Automatic speech recognition
    O'Shaughnessy, Douglas
    2015 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON), 2015, : 417 - 424
  • [28] AUTOMATIC SPEECH RECOGNITION
    IVALL, T
    ELECTRONICS & WIRELESS WORLD, 1984, 90 (1581): : 73 - 76
  • [29] AUTOMATIC RECOGNITION OF SPEECH
    MARILL, T
    IRE TRANSACTIONS ON HUMAN FACTORS IN ELECTRONICS, 1961, HFE2 (01): : 34 - +
  • [30] Improving hearing-aid gains based on automatic speech recognition
    Fontan, Lionel
    Le Coz, Maxime
    Azzopardi, Charlotte
    Stone, Michael A.
    Fuellgrabe, Christian
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 148 (03): : EL227 - EL233