Modal Contrastive Learning Based End-to-End Text Image Machine Translation

被引:0
|
作者
Ma, Cong [1 ,2 ]
Han, Xu [1 ,2 ]
Wu, Linghui [1 ,2 ]
Zhang, Yaping [1 ,2 ]
Zhao, Yang [1 ,2 ]
Zhou, Yu [1 ,2 ]
Zong, Chengqing [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Machine translation; Decoding; Semantics; Pipelines; Text recognition; Task analysis; Text image machine translation; contrastive learning; text image recognition; machine translation; RECOGNITION;
D O I
10.1109/TASLP.2023.3324540
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text image machine translation (TIMT) aims at directly translating text in the source language embedded in images into the target language. Most existing systems follow the cascaded pipeline diagram from recognition to translation, which suffers from the problem of error propagation, parameter redundancy, and information reduction. The end-to-end model has the potential to alleviate these issues via bridging the recognition and translation models. However, the challenge is the data limitation and modality gap between text and image. In this paper, we propose a novel end-to-end model, namely Modal contrastive learning based End-to-end Text Image Machine Translation (METIMT), which alleviates these issues through end-to-end text image machine translation architecture and modal contrastive learning. Specifically, an image encoder is designed to encode images into the same feature space of corresponding text sentences, with the guidance of an intra-modal and inter-modal contrastive learning module. To further promote the research of text image machine translation, we have constructed one synthetic and two real-world datasets. Extensive experiments show that our lighter, faster model outperforms not only existing pipeline methods but also state-of-the-art end-to-end models on both synthetic and real-world evaluation sets. Our code and dataset will be released to the public.
引用
收藏
页码:2153 / 2165
页数:13
相关论文
共 50 条
  • [21] An End-to-End Sequence Learning Approach for Text Extraction and Recognition from Scene Image
    Lalitha, G.
    Lavanya, B.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (07): : 220 - 228
  • [22] End-to-end Speech Translation by Integrating Cross-modal Information
    Liu Y.-C.
    Zong C.-Q.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (04): : 1837 - 1849
  • [23] Scene text spotting based on end-to-end
    Wei G.
    Rong W.
    Liang Y.
    Xiao X.
    Liu X.
    Journal of Intelligent and Fuzzy Systems, 2021, 40 (05): : 8871 - 8881
  • [24] End-to-End Learning for Image Burst Deblurring
    Wieschollek, Patrick
    Schoelkopf, Bernhard
    Lensch, Hendrik P. A.
    Hirsch, Michael
    COMPUTER VISION - ACCV 2016, PT IV, 2017, 10114 : 35 - 51
  • [25] MINTZAI: End-to-end Deep Learning for Speech Translation
    Etchegoyhen, Thierry
    Arzelus, Haritz
    Gete, Harritxu
    Alvarez, Aitor
    Hernaez, Inma
    Navas, Eva
    Gonzalez-Docasal, Ander
    Osacar, Jaime
    Benites, Edson
    Ellakuria, Igor
    Calonge, Eusebi
    Martin, Maite
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 97 - 100
  • [26] End-to-end Learning for Encrypted Image Retrieval
    Feng, Qihua
    Li, Peiya
    Lu, ZhiXun
    Liu, Guan
    Huang, Feiran
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1839 - 1845
  • [27] X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
    Ma, Yiwei
    Xu, Guohai
    Sun, Xiaoshuai
    Yan, Ming
    Zhang, Ji
    Ji, Rongrong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [28] Machine Learning Based End-to-End Constellation Training for Communication Systems
    Lin, Po-Chiang
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1768 - 1773
  • [29] End-to-end entity-aware neural machine translation
    Shufang Xie
    Yingce Xia
    Lijun Wu
    Yiqing Huang
    Yang Fan
    Tao Qin
    Machine Learning, 2022, 111 : 1181 - 1203
  • [30] End-to-end entity-aware neural machine translation
    Xie, Shufang
    Xia, Yingce
    Wu, Lijun
    Huang, Yiqing
    Fan, Yang
    Qin, Tao
    MACHINE LEARNING, 2022, 111 (03) : 1181 - 1203