Lattice Transformer for Speech Translation

被引:0
|
作者
Zhang, Pei [1 ]
Chen, Boxing [1 ]
Ge, Niyu [1 ]
Fan, Kai [1 ]
机构
[1] Alibaba Grp Inc, Hangzhou, Zhejiang, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in sequence modeling have highlighted the strengths of the transformer architecture, especially in achieving state-of-the-art machine translation results. However, depending on the up-stream systems, e.g., speech recognition, or word segmentation, the input to translation system can vary greatly. The goal of this work is to extend the attention mechanism of the transformer to naturally consume the lattice in addition to the traditional sequential input. We first propose a general lattice transformer for speech translation where the input is the output of the automatic speech recognition (ASR) which contains multiple paths and posterior scores. To leverage the extra information from the lattice structure, we develop a novel controllable lattice attention mechanism to obtain latent representations. On the LDC Spanish-English speech translation corpus, our experiments show that lattice transformer generalizes significantly better and outperforms both a transformer baseline and a lattice LSTM. Additionally, we validate our approach on the WMT 2017 Chinese-English translation task with lattice inputs from different BPE segmentations. In this task, we also observe the improvements over strong baselines.
引用
收藏
页码:6475 / 6484
页数:10
相关论文
共 50 条
  • [1] Integration of speech recognition and machine translation: Speech recognition word lattice translation
    Zhang, RQ
    Kikui, G
    SPEECH COMMUNICATION, 2006, 48 (3-4) : 321 - 334
  • [2] TRANSFORMER-BASED DIRECT SPEECH-TO-SPEECH TRANSLATION WITH TRANSCODER
    Kano, Takatomo
    Sakti, Sakriani
    Nakamura, Satoshi
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 958 - 965
  • [3] Direct Vs Cascaded Speech-to-Speech Translation Using Transformer
    Arya, Lalaram
    Chowdhury, Amartya Roy
    Prasanna, S. R. Mahadeva
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 258 - 270
  • [4] STREAMING SIMULTANEOUS SPEECH TRANSLATION WITH AUGMENTED MEMORY TRANSFORMER
    Ma, Xutai
    Wang, Yongqiang
    Dousti, Mohammad Javad
    Koehn, Philipp
    Pino, Juan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7523 - 7527
  • [5] Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation
    Raffel, Matthew
    Chen, Lizhong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12900 - 12907
  • [6] Lattice-Based Transformer Encoder for Neural Machine Translation
    Xiao, Fengshun
    Li, Jiangtong
    Zhao, Hai
    Wang, Rui
    Chen, Kehai
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3090 - 3097
  • [7] DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
    Fang, Qingkai
    Zhou, Yan
    Feng, Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
    Deng, Keqi
    Watanabe, Shinji
    Shi, Jiatong
    Arora, Siddhant
    INTERSPEECH 2022, 2022, : 1746 - 1750
  • [9] Lattice-based Viterbi decoding techniques for speech translation
    Saon, George
    Picheny, Michael
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 386 - 389
  • [10] SIMULTANEOUS SPEECH-TO-SPEECH TRANSLATION SYSTEM WITH TRANSFORMER-BASED INCREMENTAL ASR, MT, AND TTS
    Fukuda, Ryo
    Novitasari, Sashi
    Oka, Yui
    Kano, Yasumasa
    Yano, Yuki
    Ko, Yuka
    Tokuyama, Hirotaka
    Doi, Kosuke
    Yanagita, Tomoya
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 186 - 192