Unified Speech-Text Pre-training for Speech Translation and Recognition

被引：0

作者：

Tang, Yun ^{[1
]}

Gong, Hongyu ^{[1
]}

Dong, Ning ^{[1
]}

Wang, Changhan ^{[1
]}

Hsu, Wei-Ning ^{[1
]}

Gu, Jiatao ^{[1
]}

Baevski, Alexei ^{[1
]}

Li, Xian ^{[1
]}

Mohamed, Abdelrahman ^{[1
]}

Auli, Michael ^{[1
]}

Pino, Juan ^{[1
]}

机构：

[1] Meta AI, Menlo Pk, CA 94025 USA

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We describe a method to jointly pre-train speech and text in an encoder-decoder modeling framework for speech translation and recognition. The proposed method incorporates four self-supervised and supervised subtasks for cross modality learning. A self-supervised speech subtask leverages un-labelled speech data, and a (self-)supervised text to text subtask makes use of abundant text training data. Two auxiliary supervised speech tasks are included to unify speech and text modeling space. Our contribution lies in integrating linguistic information from the text corpus into the speech pre-training. Detailed analysis reveals learning interference among subtasks. Two pre-training configurations for speech translation and recognition, respectively, are presented to alleviate subtask interference. Our experiments show the proposed method can effectively fuse speech and text information into one model. It achieves between 1.7 and 2.3 BLEU improvement above the state of the art on the MUST-C speech translation dataset and comparable WERs to wav2vec 2.0 on the LIBRISPEECH speech recognition task. (1)

引用

页码：1488 / 1499

页数：12

共 50 条

[21] GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION
Khare, Aparna
Wu, Minhua
Bhati, Saurabhchand
Droppo, Jasha
Maas, Roland
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 174 - 181
[22] Speech Recognition, Machine Translation, and Speech Translation-A Unified Discriminative Learning Paradigm
He, Xiaodong
Deng, Li
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2011, 28 (05) : 126 - 133
[23] A STUDY ON THE EFFICACY OF MODEL PRE-TRAINING IN DEVELOPING NEURAL TEXT-TO-SPEECH SYSTEM
Zhang, Guangyan
Leng, Yichong
Tan, Daxin
Qin, Ying
Song, Kaitao
Tan, Xu
Zhao, Sheng
Lee, Tan
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6087 - 6091
[24] ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks
Pelloin, Valentin
Dary, Franck
Herve, Nicolas
Favre, Benoit
Camelin, Nathalie
Laurent, Antoine
Besacier, Laurent
[J]. INTERSPEECH 2022, 2022, : 3453 - 3457
[25] Training Speech Recognition Model with Speech Synthesis and Text Discriminator
Lin, Hou-an
Chen, Chia-ping
[J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (02) : 359 - 373
[26] Investigating Self-supervised Pre-training for End-to-end Speech Translation
Ha Nguyen
Bougares, Fethi
Tomashenko, Natalia
Esteve, Yannick
Besacier, Laurent
[J]. INTERSPEECH 2020, 2020, : 1466 - 1470
[27] Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning
Shang, Yanan
Fu, Tianqi
[J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 24
[28] Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding
Liu, Yuchen
Zhang, Jiajun
Xiong, Hao
Zhou, Long
He, Zhongjun
Wu, Hua
Wang, Haifeng
Zong, Chengqing
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8417 - 8424
[29] Neural speech enhancement with unsupervised pre-training and mixture training
Hao, Xiang
Xu, Chenglin
Xie, Lei
[J]. NEURAL NETWORKS, 2023, 158 : 216 - 227
[30] GENERATIVE PRE-TRAINING FOR SPEECH WITH AUTOREGRESSIVE PREDICTIVE CODING
Chung, Yu-An
Glass, James
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3497 - 3501

← 1 2 3 4 5 →