Rule Driven Spreadsheet Data Extraction from Statistical Tables: Case Study

被引:3
|
作者
Paramonov, Viacheslav [1 ,2 ]
Shigarov, Alexey [1 ,2 ]
Vetrova, Varvara [1 ,3 ]
机构
[1] Russian Acad Sci, Matrosov Inst Syst Dynam & Control Theory, Siberian Branch, Irkutsk, Russia
[2] Irkutsk State Univ, Inst Math & Informat Technol, Irkutsk, Russia
[3] Univ Canterbury, Sch Math & Stat, Christchurch, New Zealand
基金
俄罗斯科学基金会;
关键词
Table understanding; Data transformation; Table extraction; Table analysis; Spreadsheet; Table header; Heuristics; Case study; Rules;
D O I
10.1007/978-3-030-88304-1_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spreadsheet tables are one of the most commonly used formats to organise and store sets of statistical, financial, accounting and other types of data. This form of data representation is widely used in science, education, engineering, and business. The key feature of spreadsheet tables that they are generally created by people in order to be further used by other people rather than by automated programs. During spreadsheet creation, commonly, no consideration is given to the possibility of further automated data processing. This leads to a large variety of possible spreadsheet table structures and further complicates automated extraction of table content and table understanding. One of the key factors that influence on the quality of table understanding by machines is the correctness of the header structure, for example, position and relation between cells. In this paper, we present a case study of a tabular data extraction approach and estimate its performance on a variety of datasets. The rule-driven software platform TabbyXL was used for tabular data extraction and canonicalisation. The experiment was conducted on real-world tables of SAUS200 (The 2010 Statistical Abstract of the United States) corpora. For the evaluation, we used spreadsheet tables as they are presented in SAUS; the same tables, but with an automatically corrected header structure; and tables where the structure of the header was corrected by experts. The case study results demonstrate the importance of header structure correctness for automated table processing and understanding. The ground-truth preparation procedures, example of rules describing relationships between table elements, and results of the evaluation are presented in the paper.
引用
收藏
页码:84 / 95
页数:12
相关论文
共 50 条
  • [21] Rule Extraction from Training Data Using Neural Network
    Biswas, Saroj Kumar
    Chakraborty, Manomita
    Purkayastha, Biswajit
    Roy, Pinki
    Thounaojam, Dalton Meitei
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (03)
  • [22] Multiobjective optimization in linguistic rule extraction from numerical data
    Ishibuchi, H
    Nakashima, T
    Murata, T
    EVOLUTIONARY MULTI-CRITERION OPTIMIZATION, PROCEEDINGS, 2001, 1993 : 588 - 602
  • [23] Rule extraction from neural networks in data mining applications
    Hruschka, ER
    Ebecken, NFF
    DATA MINING, 1998, : 289 - 301
  • [24] Understanding time series networks: A case study in rule extraction
    Craven, MW
    Shavlik, JW
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 1997, 8 (04) : 373 - 384
  • [25] Experimental study of evolutionary based method of rule extraction from neural networks in medical data
    Markowska-Kaczmar, Urszula
    Matkowski, Rafal
    ADVANCES IN DATA MINING: APPLICATIONS IN MEDICINE, WEB MINING, MARKETING, IMAGE AND SIGNAL MINING, 2006, 4065 : 76 - 90
  • [26] A rule extraction study from svm on sentiment analysis
    Bologna G.
    Hayashi Y.
    Big Data and Cognitive Computing, 2018, 2 (01) : 1 - 19
  • [27] Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer
    Faith Wavinya Mutinda
    Kongmeng Liew
    Shuntaro Yada
    Shoko Wakamiya
    Eiji Aramaki
    BMC Medical Informatics and Decision Making, 22
  • [28] Data-driven Extraction Method of Belief Rule for Reagent Addition in Antimony Rougher Flotation
    Wang, Xiaoli
    Lv, Xingxiao
    Yang, Chunhua
    IFAC PAPERSONLINE, 2019, 52 (14): : 72 - 77
  • [29] Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer
    Mutinda, Faith Wavinya
    Liew, Kongmeng
    Yada, Shuntaro
    Wakamiya, Shoko
    Aramaki, Eiji
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [30] Solar driven mass cultivation and the extraction of lipids from Chlorella variabilis: A case study
    Bhattacharya, Sourish
    Maurya, Rahulkumar
    Mishra, Sanjiv Kumar
    Ghosh, Tonmoy
    Patidar, Shailesh Kumar
    Paliwal, Chetan
    Chokshi, Kaumeel
    Pancha, Imran
    Maiti, Subarna
    Mishra, Sandhya
    ALGAL RESEARCH-BIOMASS BIOFUELS AND BIOPRODUCTS, 2016, 14 : 137 - 142