Adapting End-to-End Speech Recognition for Readable Subtitles

被引：0

作者：

Liu, Danni ^{[1
]}

Niehues, Jan ^{[1
]}

Spanakis, Gerasimos ^{[1
]}

机构：

[1] Maastricht Univ, Dept Data Sci & Knowledge Engn, Maastricht, Netherlands

来源：

17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time. Therefore, this work focuses on ASR with output compression, a task challenging for supervised approaches due to the scarcity of training data. We first investigate a cascaded system, where an unsupervised compression model is used to post-edit the transcribed speech. We then compare several methods of end-to-end speech recognition under output length constraints. The experiments show that with limited data far less than needed for training a model from scratch, we can adapt a Transformer-based ASR model to incorporate both transcription and compression capabilities. Furthermore, the best performance in terms of WER and ROUGE scores is achieved by explicitly modeling the length constraints within the end-to-end ASR system.

引用

页码：247 / 256

页数：10

共 50 条

[1] END-TO-END MULTIMODAL SPEECH RECOGNITION
Palaskar, Shruti
Sanabria, Ramon
Metze, Florian
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
[2] End-to-End Speech Recognition in Russian
Markovnikov, Nikita
Kipyatkova, Irina
Lyakso, Elena
[J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
[3] Overview of end-to-end speech recognition
Wang, Song
Li, Guanyu
[J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
[4] Multichannel End-to-end Speech Recognition
Ochiai, Tsubasa
Watanabe, Shinji
Hori, Takaaki
Hershey, John R.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[5] End-to-end Accented Speech Recognition
Viglino, Thibault
Motlicek, Petr
Cernak, Milos
[J]. INTERSPEECH 2019, 2019, : 2140 - 2144
[6] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
Petridis, Stavros
Stafylakis, Themos
Ma, Pingchuan
Cai, Feipeng
Tzimiropoulos, Georgios
Pantic, Maja
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
[7] END-TO-END ANCHORED SPEECH RECOGNITION
Wang, Yiming
Fan, Xing
Chen, I-Fan
Liu, Yuzong
Chen, Tongfei
Hoffmeister, Bjorn
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
[8] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
Liu, Da-Rong
Yang, Chi-Yu
Wu, Szu-Lin
Lee, Hung-Yi
[J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
[9] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
Kim, Chanwoo
Kim, Sungsoo
Kim, Kwangyoun
Kumar, Mehul
Kim, Jiyeon
Lee, Kyungmin
Han, Changwoo
Garg, Abhinav
Kim, Eunhyang
Shin, Minkyoo
Singh, Shatrughan
Heck, Larry
Gowda, Dhananjaya
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
[10] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
Petridis, Stavros
Li, Zuwei
Pantic, Maja
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596

← 1 2 3 4 5 →