TOWARDS UNSUPERVISED SPEECH-TO-TEXT TRANSLATION

被引：0

作者：

Chung, Yu-An ^{[1
]}

Weng, Wei-Hung ^{[1
]}

Tong, Schrasing ^{[1
]}

Glass, James ^{[1
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

speech-to-text translation; unsupervised speech processing; speech2vec; bilingual lexicon induction;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language. As opposed to traditional cascaded systems and end-to-end architectures, our system does not require any labeled data (i.e., transcribed source audio or parallel source and target text corpora) during training, making it especially applicable to language pairs with very few or even zero bilingual resources. The framework initializes the ST system with a cross-modal bilingual dictionary inferred from the monolingual corpora, that maps every source speech segment corresponding to a spoken word to its target text translation. For unseen source speech utterances, the system first performs word-by-word translation on each speech segment in the utterance. The translation is improved by leveraging a language model and a sequence denoising autoencoder to provide prior knowledge about the target language. Experimental results show that our unsupervised system achieves comparable BLEU scores to supervised end-to-end models despite the lack of supervision. We also provide an ablation analysis to examine the utility of each component in our system.

引用

页码：7170 / 7174

页数：5

共 50 条

[1] Consecutive Decoding for Speech-to-text Translation
Dong, Qianqian
Wang, Mingxuan
Zhou, Hao
Xu, Shuang
Xu, Bo
Li, Lei
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12738 - 12748
[2] Quaero Speech-to-Text and Text Translation Evaluation Systems
Stueker, Sebastian
Kilgour, Kevin
Niehues, Jan
[J]. HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING '10, 2011, : 529 - +
[3] Low-Resource Speech-to-Text Translation
Bansal, Sameer
Kamper, Herman
Livescu, Karen
Lopez, Adam
Goldwater, Sharon
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1298 - 1302
[4] Recent Advances in Direct Speech-to-text Translation
Xu, Chen
Ye, Rong
Dong, Qianqian
Zhao, Chengqi
Ko, Tom
Wang, Mingxuan
Xiao, Tong
Zhu, Jingbo
[J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6796 - 6804
[5] Improved Machine Translation of Speech-to-Text outputs
Dechelotte, Daniel
Schwenk, Holger
Adda, Gilles
Gauvain, Jean-Luc
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2632 - 2635
[6] Back Translation for Speech-to-text TranslationWithout Transcripts
Fang, Qingkai
Feng, Yang
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4567 - 4587
[7] Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding
Liu, Yuchen
Zhang, Jiajun
Xiong, Hao
Zhou, Long
He, Zhongjun
Wu, Hua
Wang, Haifeng
Zong, Chengqing
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8417 - 8424
[8] Significance of Audio Quality in Speech-to-Text Translation Systems
Rajkhowa, Tonmoy
Chowdhury, Amartya Roy
Prasanna, S. R. Mahadeva
[J]. SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 32 - 42
[9] Learning Shared Semantic Space for Speech-to-Text Translation
Han, Chi
Wang, Mingxuan
Ji, Heng
Li, Lei
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2214 - 2225
[10] TOWARDS ROBUST SPEECH-TO-TEXT ADVERSARIAL ATTACK
Esmaeilpour, Mohammad
Cardinal, Patrick
Koerich, Alessandro Lameiras
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2869 - 2873

← 1 2 3 4 5 →