TableVLM: Multi-modal Pre-training for Table Structure Recognition

被引:0
|
作者
Chen, Leiyuan [1 ,2 ]
Huang, Chengsong [1 ,2 ]
Zheng, Xiaoqing [1 ,2 ]
Lin, Jinshu [3 ]
Huang, Xuanjing [1 ,2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[3] Hundsun, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tables are widely used in research and business, and are suitable for human consumption, but not easily machine-processable, particularly when tables are present in images. One of the main challenges to extracting data from images of tables is to accurately recognize table structures, especially for complex tables with cross rows and columns. In this study, we propose a novel multi-modal pre-training model for table structure recognition, named TableVLM. With a two-stream multi-modal transformer-based encoder-decoder architecture, TableVLM learns to capture rich table structure-related features by multiple carefullydesigned unsupervised objectives inspired by the notion of masked visual-language modeling. To pre-train this model, we also created a dataset, called ComplexTable, which consists of 1, 000K samples to be released publicly. Experiment results show that the model built on pre-trained TableVLM can improve the performance up to 1.97% in tree-editing-distancescore on ComplexTable.
引用
收藏
页码:2437 / 2449
页数:13
相关论文
共 50 条
  • [1] MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION
    Chan, David M.
    Ghosh, Shalini
    Chakrabarty, Debmalya
    Hoffmeister, Bjorn
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 246 - 250
  • [2] Multi-Modal Contrastive Pre-training for Recommendation
    Liu, Zhuang
    Ma, Yunpu
    Schubert, Matthias
    Ouyang, Yuanxin
    Xiong, Zhang
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108
  • [3] MGeo: Multi-Modal Geographic Language Model Pre-Training
    Ding, Ruixue
    Chen, Boli
    Xie, Pengjun
    Huang, Fei
    Li, Xin
    Zhang, Qiang
    Xu, Yao
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 185 - 194
  • [4] Dynamic facial expression recognition with pseudo-label guided multi-modal pre-training
    Yin, Bing
    Yin, Shi
    Liu, Cong
    Zhang, Yanyong
    Xi, Changfeng
    Yin, Baocai
    Ling, Zhenhua
    [J]. IET COMPUTER VISION, 2024, 18 (01) : 33 - 45
  • [5] Real-time Emotion Pre-Recognition in Conversations with Contrastive Multi-modal Dialogue Pre-training
    Ju, Xincheng
    Zhang, Dong
    Zhu, Suyang
    Li, Junhui
    Li, Shoushan
    Zhou, Guodong
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1045 - 1055
  • [6] Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion
    Yan, Zhiqiang
    Li, Xiang
    Wang, Kun
    Zhang, Zhenyu
    Li, Jun
    Yang, Jian
    [J]. COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 378 - 395
  • [7] Versatile Multi-Modal Pre-Training for Human-Centric Perception
    Hong, Fangzhou
    Pan, Liang
    Cai, Zhongang
    Liu, Ziwei
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16135 - 16145
  • [8] Graph-Text Multi-Modal Pre-training for Medical Representation Learning
    Park, Sungjin
    Bae, Seongsu
    Kim, Jiho
    Kim, Tackeun
    Choi, Edward
    [J]. CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 261 - 281
  • [9] MMPT'21: International JointWorkshop on Multi-Modal Pre-Training for Multimedia Understanding
    Liu, Bei
    Fu, Jianlong
    Chen, Shizhe
    Jin, Qin
    Hauptmann, Alexander
    Rui, Yong
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 694 - 695
  • [10] The Effectiveness of Self-supervised Pre-training for Multi-modal Endometriosis Classification
    Butler, David
    Wang, Hu
    Zhang, Yuan
    To, Minh-Son
    Condous, George
    Leonardi, Mathew
    Knox, Steven
    Avery, Jodie
    Hull, M. Louise
    Carneiro, Gustavo
    [J]. 2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,