ANALYZING ASR PRETRAINING FOR LOW-RESOURCE SPEECH-TO-TEXT TRANSLATION

被引:0
|
作者
Stoian, Mihaela C. [1 ]
Bansal, Sameer [1 ]
Goldwater, Sharon [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
关键词
speech-to-text translation; transfer learning; pretraining; speech recognition; data augmentation; NEURAL-NETWORKS; RECURRENT;
D O I
10.1109/icassp40776.2020.9053847
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Previous work has shown that for low-resource source languages, automatic speech-to-text translation (AST) can be improved by pretraining an end-to-end model on automatic speech recognition (ASR) data from a high-resource language. However, it is not clear what factors-e.g., language relatedness or size of the pretraining data-yield the biggest improvements, or whether pretraining can be effectively combined with other methods such as data augmentation. Here, we experiment with pretraining on datasets of varying sizes, including languages related and unrelated to the AST source language. We find that the best predictor of final AST performance is the word error rate of the pretrained ASR model, and that differences in ASR/AST performance correlate with how phonetic information is encoded in the later RNN layers of our model. We also show that pretraining and data augmentation yield complementary benefits for AST.
引用
收藏
页码:7909 / 7913
页数:5
相关论文
共 50 条
  • [31] NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022
    Fukuda, Ryo
    Ko, Yuka
    Kano, Yasumasa
    Doi, Kosuke
    Tokuyama, Hirotaka
    Saktit, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    [J]. PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 286 - 292
  • [32] ON-TRAC' systems for the IWSLT 2021 low-resource speech translation and multilingual speech translation shared tasks
    Lee, Hang
    Barbier, Florentin
    Ha Nguyen
    Tomanshenko, Natalia
    Mdhaffar, Salima
    Gahbiche, Souhir
    Bougares, Fethi
    Lecouteux, Benjamin
    Schwabe, Didier
    Esteve, Yannick
    [J]. IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 169 - 174
  • [33] INCORPORATING DISCRIMINATIVE DPGMM POSTERIORGRAMS FOR LOW-RESOURCE ASR
    Wu, Bin
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 201 - 208
  • [34] Reduce and Reconstruct: ASR for Low-Resource Phonetic Languages
    Diwan, Anuj
    Jyothi, Preethi
    [J]. INTERSPEECH 2021, 2021, : 3445 - 3449
  • [35] Data Augmentation for Low-Resource Quechua ASR Improvement
    Zevallos, Rodolfo
    Bel, Nuria
    Cambara, Guillermo
    Farrus, Mireia
    Luque, Jordi
    [J]. INTERSPEECH 2022, 2022, : 3518 - 3522
  • [36] SYNTHETIC DATA AUGMENTATION FOR IMPROVING LOW-RESOURCE ASR
    Thai, Bao
    Jimerson, Robert
    Arcoraci, Dominic
    Prud'hommeaux, Emily
    Ptucha, Raymond
    [J]. 2019 IEEE WESTERN NEW YORK IMAGE AND SIGNAL PROCESSING WORKSHOP (WNYISPW), 2019,
  • [37] Revisiting End-to-End Speech-to-Text Translation From Scratch
    Zhang, Biao
    Haddow, Barry
    Sennrich, Rico
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [38] Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
    Zheng, Renjie
    Chen, Junkun
    Ma, Mingbo
    Huang, Liang
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [39] Survey of Low-Resource Machine Translation
    Haddow, Barry
    Bawden, Rachel
    Barone, Antonio Valerio Miceli
    Helcl, Jindrich
    Birch, Alexandra
    [J]. COMPUTATIONAL LINGUISTICS, 2022, 48 (03) : 673 - 732
  • [40] Terminology Translation in Low-Resource Scenarios
    Haque, Rejwanul
    Hasanuzzaman, Mohammed
    Way, Andy
    [J]. INFORMATION, 2019, 10 (09)