Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding

被引:0
|
作者
Liu, Yuchen [1 ,2 ]
Zhang, Jiajun [1 ,2 ]
Xiong, Hao [4 ]
Zhou, Long [1 ,2 ]
He, Zhongjun [4 ]
Wu, Hua [4 ]
Wang, Haifeng [4 ]
Zong, Chengqing [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, 10 Shangdi 10th St, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, 10 Shangdi 10th St, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, 10 Shangdi 10th St, Beijing, Peoples R China
[4] Baidu Inc, 10 Shangdi 10th St, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years. Compared to the traditional pipeline system, the end-to-end ST model has potential benefits of lower latency, smaller model size, and less error propagation. However, it is notoriously difficult to implement such a model without transcriptions as intermediate. Existing works generally apply multi-task learning to improve translation quality by jointly training end-to-end ST along with automatic speech recognition (ASR). However, different tasks in this method cannot utilize information from each other, which limits the improvement. Other works propose a two-stage model where the second model can use the hidden state from the first one, but its cascade manner greatly affects the efficiency of training and inference process. In this paper, we propose a novel interactive attention mechanism which enables ASR and ST to perform synchronously and interactively in a single model. Specifically, the generation of transcriptions and translations not only relies on its previous outputs but also the outputs predicted in the other task. Experiments on TED speech translation corpora have shown that our proposed model can outperform strong baselines on the quality of speech translation and achieve better speech recognition performances as well.
引用
收藏
页码:8417 / 8424
页数:8
相关论文
共 50 条
  • [1] Consecutive Decoding for Speech-to-text Translation
    Dong, Qianqian
    Wang, Mingxuan
    Zhou, Hao
    Xu, Shuang
    Xu, Bo
    Li, Lei
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12738 - 12748
  • [2] TOWARDS UNSUPERVISED SPEECH-TO-TEXT TRANSLATION
    Chung, Yu-An
    Weng, Wei-Hung
    Tong, Schrasing
    Glass, James
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7170 - 7174
  • [3] Low-Resource Speech-to-Text Translation
    Bansal, Sameer
    Kamper, Herman
    Livescu, Karen
    Lopez, Adam
    Goldwater, Sharon
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1298 - 1302
  • [4] Quaero Speech-to-Text and Text Translation Evaluation Systems
    Stueker, Sebastian
    Kilgour, Kevin
    Niehues, Jan
    [J]. HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING '10, 2011, : 529 - +
  • [5] Improved Machine Translation of Speech-to-Text outputs
    Dechelotte, Daniel
    Schwenk, Holger
    Adda, Gilles
    Gauvain, Jean-Luc
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2632 - 2635
  • [6] Recent Advances in Direct Speech-to-text Translation
    Xu, Chen
    Ye, Rong
    Dong, Qianqian
    Zhao, Chengqi
    Ko, Tom
    Wang, Mingxuan
    Xiao, Tong
    Zhu, Jingbo
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6796 - 6804
  • [7] Back Translation for Speech-to-text TranslationWithout Transcripts
    Fang, Qingkai
    Feng, Yang
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4567 - 4587
  • [8] Effects of Speech-to-Text Recognition Application on Learning Performance in Synchronous Cyber Classrooms
    Hwang, Wu-Yuin
    Shadiev, Rustam
    Kuo, Tony C. T.
    Chen, Nian-Shing
    [J]. EDUCATIONAL TECHNOLOGY & SOCIETY, 2012, 15 (01): : 367 - 380
  • [9] TOWARDS END-TO-END SPEECH-TO-TEXT TRANSLATION WITH TWO-PASS DECODING
    Sung, Tzu-Wei
    Liu, Jun-You
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7175 - 7179
  • [10] A Survey on Bengali Speech-to-Text Recognition Techniques
    Sultana, Rumia
    Palit, Ratesh
    [J]. 2014 9TH INTERNATIONAL FORUM ON STRATEGIC TECHNOLOGY (IFOST), 2014, : 26 - 29