Supervised Visual Attention for Simultaneous Multimodal Machine Translation

被引:0
|
作者
Haralampieva, Veneta [1 ]
Caglayan, Ozan [1 ]
Specia, Lucia [1 ]
机构
[1] Department of Computing, Imperial College London, United Kingdom
基金
欧盟地平线“2020”;
关键词
Machine translations - Multi-modal - Multimodal system - Partial information - Region alignments - Simultaneous translation - Translation quality - Visual Attention - Visual attention mechanisms - Visual context;
D O I
10.1613/JAIR.1.13546
中图分类号
学科分类号
摘要
There has been a surge in research in multimodal machine translation (MMT), where additional modalities such as images are used to improve translation quality of textual systems. A particular use for such multimodal systems is the task of simultaneous machine translation, where visual context has been shown to complement the partial information provided by the source sentence, especially in the early phases of translation. In this paper, we propose the first Transformer-based simultaneous MMT architecture, which has not been previously explored in simultaneous translation. Additionally, we extend this model with an auxiliary supervision signal that guides the visual attention mechanism using labelled phrase-region alignments. We perform comprehensive experiments on three language directions and conduct thorough quantitative and qualitative analyses using both automatic metrics and manual inspection. Our results show that (i) supervised visual attention consistently improves the translation quality of the simultaneous MMT models, and (ii) fine-tuning the MMT with supervision loss enabled leads to better performance than training the MMT from scratch. Compared to the state-of-the-art, our proposed model achieves improvements of up to 2.3 BLEU and 3.5 METEOR points. © 2022 AI Access Foundation.
引用
收藏
页码:1059 / 1089
相关论文
共 50 条
  • [1] Supervised Visual Attention for Simultaneous Multimodal Machine Translation
    Haralampieva, Veneta
    Caglayan, Ozan
    Specia, Lucia
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 74 : 1059 - 1089
  • [2] A Visual Attention Grounding Neural Model for Multimodal Machine Translation
    Zhou, Mingyang
    Cheng, Runxiang
    Lee, Yong Jae
    Yu, Zhou
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3643 - 3653
  • [3] Simultaneous Machine Translation with Visual Context
    Caglayan, Ozan
    Ive, Julia
    Haralampieva, Veneta
    Madhyastha, Pranava
    Barrault, Loic
    Specia, Lucia
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2350 - 2361
  • [4] Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation
    He, Julia
    Li, Andy Mingren
    Miao, Yishu
    Caglayan, Ozan
    Madhyastha, Pranava
    Specia, Lucia
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3222 - 3233
  • [5] Probing the Need for Visual Context in Multimodal Machine Translation
    Caglayan, Ozan
    Madhyastha, Pranava
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4159 - 4170
  • [6] Multimodal Machine Translation with Fusion of Generated Visual Information
    Yuan, Jiaqi
    Shi, Xiayang
    Niu, Yue
    Niu, Yufeng
    Wang, Xuhui
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 150 - 156
  • [7] Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
    Arivazhagan, Naveen
    Cherry, Colin
    Macherey, Wolfgang
    Chiu, Chung-Cheng
    Yavuz, Semih
    Pang, Ruoming
    Li, Wei
    Raffel, Colin
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1313 - 1323
  • [8] Simultaneous neural machine translation with a reinforced attention mechanism
    Lee, YoHan
    Shin, JongHun
    Kim, YoungKil
    [J]. ETRI JOURNAL, 2021, 43 (05) : 775 - 786
  • [9] DEEPLY SUPERVISED MULTIMODAL ATTENTIONAL TRANSLATION EMBEDDINGS FOR VISUAL RELATIONSHIP DETECTION
    Gkanatsios, Nikolaos
    Pitsikalis, Vassilis
    Koutras, Petros
    Zlatintsi, Athanasia
    Maragos, Petros
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1840 - 1844
  • [10] Multimodal supervised image translation
    Ruan, Congcong
    Chen, Dihu
    Hu, Haifeng
    [J]. ELECTRONICS LETTERS, 2019, 55 (04) : 190 - 191