Numerical Tuple Extraction from Tables with Pre-training

被引:0
|
作者
Yang, Qingping [1 ,3 ]
Cao, Yixuan [1 ,3 ]
Luo, Ping [1 ,2 ,3 ]
机构
[1] Univ Chinese Acad Sci, CAS, Inst Comp Technol, Beijing, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Chinese Acad Sci, Key Lab Intelligent Informat Proc, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
tuple extraction; tabular representation; pre-training;
D O I
10.1145/3534678.3539460
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tables are omnipresent on the web and in various vertical domains, storing massive amounts of valuable data. However, the great flexibility in the table layout hinders the machine from understanding this valuable data. In order to unlock and utilize knowledge from tables, extracting data as numerical tuples is the first and critical step. As a form of relational data, numerical tuples have direct and transparent relationships between their elements and are therefore easy for machines to use. Extracting numerical tuples requires a deep understanding of intricate correlations between cells. The correlations are presented implicitly in texts and visual appearances of tables, which can be roughly classified into Hierarchy and Juxtaposition. Although many studies have made considerable progress in data extraction from tables, most of them only consider hierarchical relationships but neglect the juxtapositions. Meanwhile, they only evaluate their methods on relatively small corpora. This paper proposes a new framework to extract numerical tuples from tables and evaluate it on a large test set. Specifically, we convert this task into a relation extraction problem between cells. To represent cells with their intricate correlations in tables, we propose a BERT-based pre-trained language model, TableLM, to encode tables with diverse layouts. To evaluate the framework, we collect a large finance dataset that includes 19,264 tables and 604K tuples. Extensive experiments on the dataset are conducted to demonstrate the superiority of our framework compared to a well-designed baseline.
引用
下载
收藏
页码:2233 / 2241
页数:9
相关论文
共 50 条
  • [31] Speech Pre-training with Acoustic Piece
    Ren, Shuo
    Liu, Shujie
    Wu, Yu
    Zhou, Long
    Wei, Furu
    INTERSPEECH 2022, 2022, : 2648 - 2652
  • [32] Unsupervised Pre-Training for Detection Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782
  • [33] Structural Pre-training for Dialogue Comprehension
    Zhang, Zhuosheng
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5134 - 5145
  • [34] Simulated SAR for ATR pre-training
    Willis, Christopher J.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS III, 2021, 11870
  • [35] Robot Learning with Sensorimotor Pre-training
    Radosavovic, Ilija
    Shi, Baifeng
    Fu, Letian
    Goldberg, Ken
    Darrell, Trevor
    Malik, Jitendra
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [36] Rethinking pre-training on medical imaging
    Wen, Yang
    Chen, Leiting
    Deng, Yu
    Zhou, Chuan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
  • [37] Event Camera Data Pre-training
    Yang, Yan
    Pan, Liyuan
    Liu, Liu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10665 - 10675
  • [38] Pre-training Methods in Information Retrieval
    Fan, Yixing
    Xie, Xiaohui
    Cai, Yinqiong
    Chen, Jia
    Ma, Xinyu
    Li, Xiangsheng
    Zhang, Ruqing
    Guo, Jiafeng
    FOUNDATIONS AND TRENDS IN INFORMATION RETRIEVAL, 2022, 16 (03): : 178 - 317
  • [39] Pre-training in Medical Data: A Survey
    Qiu, Yixuan
    Lin, Feng
    Chen, Weitong
    Xu, Miao
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 147 - 179
  • [40] Quality Diversity for Visual Pre-Training
    Chavhan, Ruchika
    Gouk, Henry
    Li, Da
    Hospedales, Timothy
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5361 - 5371