Harvest - a System for Creating Structured Rate Filing Data from Filing PDFs

被引:0
|
作者
Tekin, Ender [1 ]
You, Qian [2 ]
Conathan, Devin M. [1 ]
Fung, Glenn M. [1 ]
Kneubuehl, Thomas S. [1 ]
机构
[1] Amer Family Mutual Insurance Co SI, Madison, WI 53783 USA
[2] Coupang Corp, Seoul, South Korea
来源
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a machine-learning-guided process that can efficiently extract factor tables from unstructured rate filing documents. Our approach combines multiple deep-learning-based models that work in tandem to create structured representations of tabular data present in unstructured documents such as pdf files. This process combines CNN's to detect tables, language-based models to extract table metadata and conventional computer vision techniques to improve the accuracy of tabular data on the machine-learning side. The extracted tabular data is validated through an intuitive user interface. This process, which we call Harvest, significantly reduces the time needed to extract tabular information from PDF files, enabling analysis of such data at a speed and scale that was previously unattainable.
引用
收藏
页码:12414 / 12422
页数:9
相关论文
共 50 条
  • [31] NONPARAMETRIC IDENTIFICATION OF MISO HAMMERSTEIN SYSTEM FROM STRUCTURED DATA附视频
    Pawel Wachel
    Przemyslaw liwiński
    Zygmunt Hasiewicz
    Journal of Systems Science and Systems Engineering, 2015, (01) : 68 - 80
  • [32] Creating a Hierarchical Fuzzy System to Assess Physical Activity Levels from Fitbit Data
    Chaudhry, F. A.
    Garibaldi, J. M.
    Qureshi, N.
    ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, 2022, 1409 : 337 - 343
  • [33] An interactive system for creating object models from range data based on simulated annealing
    Hoff, WA
    Hood, FW
    King, RH
    1997 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION - PROCEEDINGS, VOLS 1-4, 1997, : 2559 - 2564
  • [34] A generic simulation of a parking system:: Creating the model's structure from a data base
    März, L
    Richter, H
    MODELLING AND SIMULATION 2001, 2001, : 922 - 925
  • [35] Structured modelling from data and optimal control of the cooling system of a large business center
    Terzi, E.
    Fagiano, L.
    Farina, M.
    Scattolini, R.
    JOURNAL OF BUILDING ENGINEERING, 2020, 28
  • [36] Enhancing a Location-based Recommendation System by Enrichment with Structured Data from the Web
    Schmachtenberg, Max
    Strufe, Thorsten
    Paulheim, Heiko
    4TH INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, MINING AND SEMANTICS, 2014,
  • [37] Reading from Scratch - A Vision-System for Reading Data on Micro-structured Surfaces
    Dragon, Ralf
    Becker, Christian
    Rosenhahn, Bodo
    Ostermann, Joern
    PATTERN RECOGNITION, PROCEEDINGS, 2009, 5748 : 402 - 411
  • [38] The Acquisition of Structured Clinical Data from a Document-Based Electronic Medical Record System
    Takeda, Toshihiro
    Zhang, Dongyao
    Wada, Shoya
    Nakagawa, Akito
    Sugimoto, Kento
    Manabe, Shirou
    Matsumura, Yasushi
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 1600 - 1601
  • [39] Public Facilities Recommendation System based on Structured and Unstructured Data Extraction from Multi-Channel Data Sources
    Putri, Alifa Nurani
    Akbar, Saiful
    Sunindyo, Wikan Danar
    2015 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2015, : 185 - 190
  • [40] A vision for creating advanced products from EOS core system data to support geospatial applications in the state of Texas
    Tapley, BD
    Crawford, MM
    Howard, T
    Hutchison, KD
    Smith, S
    Wells, GL
    IGARSS 2001: SCANNING THE PRESENT AND RESOLVING THE FUTURE, VOLS 1-7, PROCEEDINGS, 2001, : 843 - 845