Harvest - a System for Creating Structured Rate Filing Data from Filing PDFs

被引:0
|
作者
Tekin, Ender [1 ]
You, Qian [2 ]
Conathan, Devin M. [1 ]
Fung, Glenn M. [1 ]
Kneubuehl, Thomas S. [1 ]
机构
[1] Amer Family Mutual Insurance Co SI, Madison, WI 53783 USA
[2] Coupang Corp, Seoul, South Korea
来源
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a machine-learning-guided process that can efficiently extract factor tables from unstructured rate filing documents. Our approach combines multiple deep-learning-based models that work in tandem to create structured representations of tabular data present in unstructured documents such as pdf files. This process combines CNN's to detect tables, language-based models to extract table metadata and conventional computer vision techniques to improve the accuracy of tabular data on the machine-learning side. The extracted tabular data is validated through an intuitive user interface. This process, which we call Harvest, significantly reduces the time needed to extract tabular information from PDF files, enabling analysis of such data at a speed and scale that was previously unattainable.
引用
收藏
页码:12414 / 12422
页数:9
相关论文
共 50 条
  • [21] USING A MICROCOMPUTER-BASED DATA MANAGEMENT-SYSTEM FOR NEUROPSYCHOLOGICAL RECORD FILING, REPORT GENERATION, AND AS A CLINICAL DECISION AID
    KAPUR, N
    BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1984, 37 (DEC): : 413 - 415
  • [22] A low-cost digital filing system for echocardiography data with MPEG4 compression and its application to remote diagnosis
    Umeda, A
    Iwata, Y
    Okada, Y
    Shimada, M
    Baba, A
    Minatogawa, Y
    Yamada, T
    Chino, M
    Watanabe, T
    Akaishi, M
    JOURNAL OF THE AMERICAN SOCIETY OF ECHOCARDIOGRAPHY, 2004, 17 (12) : 1297 - 1303
  • [23] ESTIMATION OF HARVEST RATE AND VULNERABILITY FROM AGE AND SEX DATA
    PALOHEIMO, JE
    FRASER, D
    JOURNAL OF WILDLIFE MANAGEMENT, 1981, 45 (04): : 948 - 958
  • [24] CrowdFill: A System for Collecting Structured Data from the Crowd
    Park, Hyunjung
    Widom, Jennifer
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 87 - 90
  • [25] Data Recovery from Old Filing Cabinets: Seasonal Diets of the Most Common Demersal Fishes in the Miramichi River Estuary (Atlantic Canada), 1991-1993
    Hanson, John Mark
    Courtenay, Simon C.
    NORTHEASTERN NATURALIST, 2020, 27 (03) : 401 - 433
  • [26] Nonparametric identification of MISO Hammerstein system from structured data
    Wachel, Pawel
    Sliwinski, Przemyslaw
    Hasiewicz, Zygmunt
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2015, 24 (01) : 68 - 80
  • [27] Nonparametric identification of MISO Hammerstein system from structured data
    Paweł Wachel
    Przemysław Śliwiński
    Zygmunt Hasiewicz
    Journal of Systems Science and Systems Engineering, 2015, 24 : 68 - 80
  • [28] Moving from Relational Data Storage to Decentralized Structured Storage System
    Saxena, Upaang
    Sachdeva, Shelly
    Batra, Shivani
    DATABASES IN NETWORKED INFORMATION SYSTEMS (DNIS 2015), 2015, 8999 : 180 - 194
  • [29] DETERMINING THE DEATH RATE FOR AN AGE-STRUCTURED POPULATION FROM CENSUS-DATA
    RUNDELL, W
    SIAM JOURNAL ON APPLIED MATHEMATICS, 1993, 53 (06) : 1731 - 1746
  • [30] Determining the death rate for N age-structured population from census data
    Rundell, William
    SIAM Journal on Applied Mathematics, 1993, 53 (06): : 1731 - 1746