TOWARDS UNSUPERVISED SPEECH-TO-TEXT TRANSLATION

被引:0
|
作者
Chung, Yu-An [1 ]
Weng, Wei-Hung [1 ]
Tong, Schrasing [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
speech-to-text translation; unsupervised speech processing; speech2vec; bilingual lexicon induction;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language. As opposed to traditional cascaded systems and end-to-end architectures, our system does not require any labeled data (i.e., transcribed source audio or parallel source and target text corpora) during training, making it especially applicable to language pairs with very few or even zero bilingual resources. The framework initializes the ST system with a cross-modal bilingual dictionary inferred from the monolingual corpora, that maps every source speech segment corresponding to a spoken word to its target text translation. For unseen source speech utterances, the system first performs word-by-word translation on each speech segment in the utterance. The translation is improved by leveraging a language model and a sequence denoising autoencoder to provide prior knowledge about the target language. Experimental results show that our unsupervised system achieves comparable BLEU scores to supervised end-to-end models despite the lack of supervision. We also provide an ablation analysis to examine the utility of each component in our system.
引用
收藏
页码:7170 / 7174
页数:5
相关论文
共 50 条
  • [1] Consecutive Decoding for Speech-to-text Translation
    Dong, Qianqian
    Wang, Mingxuan
    Zhou, Hao
    Xu, Shuang
    Xu, Bo
    Li, Lei
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12738 - 12748
  • [2] Quaero Speech-to-Text and Text Translation Evaluation Systems
    Stueker, Sebastian
    Kilgour, Kevin
    Niehues, Jan
    [J]. HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING '10, 2011, : 529 - +
  • [3] Low-Resource Speech-to-Text Translation
    Bansal, Sameer
    Kamper, Herman
    Livescu, Karen
    Lopez, Adam
    Goldwater, Sharon
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1298 - 1302
  • [4] Recent Advances in Direct Speech-to-text Translation
    Xu, Chen
    Ye, Rong
    Dong, Qianqian
    Zhao, Chengqi
    Ko, Tom
    Wang, Mingxuan
    Xiao, Tong
    Zhu, Jingbo
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6796 - 6804
  • [5] Improved Machine Translation of Speech-to-Text outputs
    Dechelotte, Daniel
    Schwenk, Holger
    Adda, Gilles
    Gauvain, Jean-Luc
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2632 - 2635
  • [6] Back Translation for Speech-to-text TranslationWithout Transcripts
    Fang, Qingkai
    Feng, Yang
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4567 - 4587
  • [7] Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding
    Liu, Yuchen
    Zhang, Jiajun
    Xiong, Hao
    Zhou, Long
    He, Zhongjun
    Wu, Hua
    Wang, Haifeng
    Zong, Chengqing
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8417 - 8424
  • [8] Significance of Audio Quality in Speech-to-Text Translation Systems
    Rajkhowa, Tonmoy
    Chowdhury, Amartya Roy
    Prasanna, S. R. Mahadeva
    [J]. SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 32 - 42
  • [9] Learning Shared Semantic Space for Speech-to-Text Translation
    Han, Chi
    Wang, Mingxuan
    Ji, Heng
    Li, Lei
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2214 - 2225
  • [10] TOWARDS ROBUST SPEECH-TO-TEXT ADVERSARIAL ATTACK
    Esmaeilpour, Mohammad
    Cardinal, Patrick
    Koerich, Alessandro Lameiras
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2869 - 2873