LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

被引:25
|
作者
Shen, Zejiang [1 ]
Zhang, Ruochen [2 ]
Dell, Melissa [3 ]
Lee, Benjamin Charles Germain [4 ]
Carlson, Jacob [3 ]
Li, Weining [5 ]
机构
[1] Allen Inst AI, Seattle, WA 98103 USA
[2] Brown Univ, Providence, RI 02912 USA
[3] Harvard Univ, Cambridge, MA 02138 USA
[4] Univ Washington, Seattle, WA 98195 USA
[5] Univ Waterloo, Waterloo, ON, Canada
关键词
Document image analysis; Deep learning; Layout analysis; Character recognition; Open source library; Toolkit;
D O I
10.1007/978-3-030-86549-8_9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though there have been on-going efforts to improve reusability and simplify deep learning (DL) model development in disciplines like natural language processing and computer vision, none of them are optimized for challenges in the domain of DIA. This represents a major gap in the existing toolkit, as DIA is central to academic research across a wide range of disciplines in the social sciences and humanities. This paper introduces LayoutParser, an open-source library for streamlining the usage of DL in DIA research and applications. The core LayoutParser library comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks. To promote extensibility, LayoutParser also incorporates a community platform for sharing both pre-trained models and full document digitization pipelines. We demonstrate that LayoutParser is helpful for both lightweight and large-scale digitization pipelines in realword use cases. The library is publicly available at https://layout-parser.github.io.
引用
收藏
页码:131 / 146
页数:16
相关论文
共 50 条
  • [1] Design and Development of Image Recognition Toolkit Based on Deep Learning
    Zhao, Hui
    Zhang, Hai-Xia
    Cao, Qing-Jiao
    Sun, Sheng-Juan
    Han, Xuanzhe
    Palaoag, Thelma D.
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (01)
  • [2] A Toolkit for Analysis of Deep Learning Experiments
    O'Donoghue, Jim
    Roantree, Mark
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS XV, 2016, 9897 : 134 - 145
  • [3] Deep learning for terahertz image denoising in nondestructive historical document analysis
    Dutta, Balaka
    Root, Konstantin
    Ullmann, Ingrid
    Wagner, Fabian
    Mayr, Martin
    Seuret, Mathias
    Thies, Mareike
    Stromer, Daniel
    Christlein, Vincent
    Schuer, Jan
    Maier, Andreas
    Huang, Yixing
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01):
  • [4] Deep learning for terahertz image denoising in nondestructive historical document analysis
    Balaka Dutta
    Konstantin Root
    Ingrid Ullmann
    Fabian Wagner
    Martin Mayr
    Mathias Seuret
    Mareike Thies
    Daniel Stromer
    Vincent Christlein
    Jan Schür
    Andreas Maier
    Yixing Huang
    [J]. Scientific Reports, 12 (1)
  • [5] A machine learning toolkit for CRISM image analysis
    Plebani, Emanuele
    Ehlmann, Bethany L.
    Leask, Ellen K.
    Fox, Valerie K.
    Dundar, M. Murat
    [J]. ICARUS, 2022, 376
  • [6] MEDICAL IMAGE ANALYSIS BASED ON DEEP LEARNING
    Dong, S.
    Wang, P.
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 122 : 66 - 66
  • [7] Document Image Dewarping using Deep Learning
    Ramanna, Vijaya
    Bukhari, Saqib
    Dengel, Andreas
    [J]. ICPRAM: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2019, : 524 - 531
  • [8] A Unified Deep Learning Framework for ssTEM Image Restoration
    Deng, Shiyu
    Huang, Wei
    Chen, Chang
    Fu, Xueyang
    Xiong, Zhiwei
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (12) : 3734 - 3746
  • [9] Deep Learning Based Language and Orientation Recognition in Document Analysis
    Chen, Li
    Wang, Song
    Fan, Wei
    Sun, Jun
    Satoshi, Naoi
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 436 - 440
  • [10] DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis
    Vogtlin, Lars
    Scius-Bertrand, Anna
    Maergner, Paul
    Fischer, Andreas
    Ingold, Rolf
    [J]. PROCEEDINGS OF THE 2023 INTERNATIONAL WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING, HIP 2023, 2023, : 61 - 66