A Visual Attention Grounding Neural Model for Multimodal Machine Translation

被引:0
|
作者
Zhou, Mingyang [1 ]
Cheng, Runxiang [1 ]
Lee, Yong Jae [1 ]
Yu, Zhou [1 ]
机构
[1] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a novel multimodal machine translation model that utilizes parallel visual and textual information. Our model jointly optimizes the learning of a shared visual-language embedding and a translator. The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics. Our approach achieves competitive state-of-the-art results on the Multi30K and the Ambiguous COCO datasets. We also collected a new multilingual multimodal product description dataset to simulate a real-world international online shopping scenario. On this dataset, our visual attention grounding model outperforms other methods by a large margin.
引用
收藏
页码:3643 / 3653
页数:11
相关论文
共 50 条
  • [1] Supervised Visual Attention for Simultaneous Multimodal Machine Translation
    Haralampieva, Veneta
    Caglayan, Ozan
    Specia, Lucia
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 74 : 1059 - 1089
  • [2] Supervised Visual Attention for Simultaneous Multimodal Machine Translation
    Haralampieva, Veneta
    Caglayan, Ozan
    Specia, Lucia
    [J]. Journal of Artificial Intelligence Research, 2022, 74 : 1059 - 1089
  • [3] A text-based visual context modulation neural model for multimodal machine translation
    Kwon, Soonmo
    Go, Byung-Hyun
    Lee, Jong-Hyeok
    [J]. PATTERN RECOGNITION LETTERS, 2020, 136 : 212 - 218
  • [4] Bilingual-Visual Consistency for Multimodal Neural Machine Translation
    Liu, Yongwen
    Liu, Dongqing
    Zhu, Shaolin
    [J]. MATHEMATICS, 2024, 12 (15)
  • [5] Neural Machine Translation with Target-Attention Model
    Yang, Mingming
    Zhang, Min
    Chen, Kehai
    Wang, Rui
    Zhao, Tiejun
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (03) : 684 - 694
  • [6] Look Harder: A Neural Machine Translation Model with Hard Attention
    Indurthi, Sathish
    Chung, Insoo
    Kim, Sangha
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3037 - 3043
  • [7] Neural Machine Translation With GRU-Gated Attention Model
    Zhang, Biao
    Xiong, Deyi
    Xie, Jun
    Su, Jinsong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4688 - 4698
  • [8] Recurrent Attention for Neural Machine Translation
    Zeng, Jiali
    Wu, Shuangzhi
    Yin, Yongjing
    Jiang, Yufan
    Li, Mu
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3216 - 3225
  • [9] Neural Machine Translation with Deep Attention
    Zhang, Biao
    Xiong, Deyi
    Su, Jinsong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 154 - 163
  • [10] Attention-via-Attention Neural Machine Translation
    Zhao, Shenjian
    Zhang, Zhihua
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 563 - 570