Modal Contrastive Learning Based End-to-End Text Image Machine Translation

被引:0
|
作者
Ma, Cong [1 ,2 ]
Han, Xu [1 ,2 ]
Wu, Linghui [1 ,2 ]
Zhang, Yaping [1 ,2 ]
Zhao, Yang [1 ,2 ]
Zhou, Yu [1 ,2 ]
Zong, Chengqing [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Machine translation; Decoding; Semantics; Pipelines; Text recognition; Task analysis; Text image machine translation; contrastive learning; text image recognition; machine translation; RECOGNITION;
D O I
10.1109/TASLP.2023.3324540
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text image machine translation (TIMT) aims at directly translating text in the source language embedded in images into the target language. Most existing systems follow the cascaded pipeline diagram from recognition to translation, which suffers from the problem of error propagation, parameter redundancy, and information reduction. The end-to-end model has the potential to alleviate these issues via bridging the recognition and translation models. However, the challenge is the data limitation and modality gap between text and image. In this paper, we propose a novel end-to-end model, namely Modal contrastive learning based End-to-end Text Image Machine Translation (METIMT), which alleviates these issues through end-to-end text image machine translation architecture and modal contrastive learning. Specifically, an image encoder is designed to encode images into the same feature space of corresponding text sentences, with the guidance of an intra-modal and inter-modal contrastive learning module. To further promote the research of text image machine translation, we have constructed one synthetic and two real-world datasets. Extensive experiments show that our lighter, faster model outperforms not only existing pipeline methods but also state-of-the-art end-to-end models on both synthetic and real-world evaluation sets. Our code and dataset will be released to the public.
引用
收藏
页码:2153 / 2165
页数:13
相关论文
共 50 条
  • [41] Research on Mongolian-Chinese machine translation based on the end-to-end neural network
    Qing-Dao-Er-Ji, Ren
    Su, Yila
    Wu, Nier
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (01)
  • [42] End-to-end Optimization of Machine Learning Prediction Queries
    Park, Kwanghyun
    Saur, Karla
    Banda, Dalitso
    Interlandi, Rathijit Sen Matteo
    Karanasos, Konstantinos
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 587 - 601
  • [43] Provenance Tracking for End-to-End Machine Learning Pipelines
    Grafberger, Stefan
    Groth, Paul
    Schelter, Sebastian
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1512 - 1512
  • [44] End-to-end Speech Translation via Cross-modal Progressive Training
    Ye, Rong
    Wang, Mingxuan
    Li, Lei
    INTERSPEECH 2021, 2021, : 2267 - 2271
  • [45] An End-to-End Framework Based on Vision-Language Fusion for Remote Sensing Cross-Modal Text-Image Retrieval
    He, Liu
    Liu, Shuyan
    An, Ran
    Zhuo, Yudong
    Tao, Jian
    MATHEMATICS, 2023, 11 (10)
  • [46] Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation
    Zhang, Ruiqing
    He, Zhongjun
    Wu, Hua
    Wang, Haifeng
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7862 - 7874
  • [47] Mutual-Learning Improves End-to-End Speech Translation
    Zhao, Jiawei
    Luo, Wei
    Chen, Boxing
    Gilman, Andrew
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3989 - 3994
  • [48] End-to-end learning for arbitrary image style transfer
    Yoon, Y. B.
    Kim, M. S.
    Choi, H. C.
    ELECTRONICS LETTERS, 2018, 54 (22) : 1276 - 1277
  • [49] Learning End-to-End Lossy Image Compression: A Benchmark
    Hu, Yueyu
    Yang, Wenhan
    Ma, Zhan
    Liu, Jiaying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (08) : 4194 - 4211
  • [50] A machine learning-based optimization for end-to-end latency in TSN networks
    Bezerra, Daniel
    Filho, Assis T. de Oliveira
    Rodrigues, Iago Richard
    Dantas, Marrone
    Barbosa, Gibson
    Souza, Ricardo
    Kelner, Judith
    Sadok, Djamel
    COMPUTER COMMUNICATIONS, 2022, 195 : 424 - 440