LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

被引：25

作者：

Shen, Zejiang ^{[1
]}

Zhang, Ruochen ^{[2
]}

Dell, Melissa ^{[3
]}

Lee, Benjamin Charles Germain ^{[4
]}

Carlson, Jacob ^{[3
]}

Li, Weining ^{[5
]}

机构：

[1] Allen Inst AI, Seattle, WA 98103 USA

[2] Brown Univ, Providence, RI 02912 USA

[3] Harvard Univ, Cambridge, MA 02138 USA

[4] Univ Washington, Seattle, WA 98195 USA

[5] Univ Waterloo, Waterloo, ON, Canada

来源：

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I | 2021年 / 12821卷

关键词：

Document image analysis; Deep learning; Layout analysis; Character recognition; Open source library; Toolkit;

D O I：

10.1007/978-3-030-86549-8_9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though there have been on-going efforts to improve reusability and simplify deep learning (DL) model development in disciplines like natural language processing and computer vision, none of them are optimized for challenges in the domain of DIA. This represents a major gap in the existing toolkit, as DIA is central to academic research across a wide range of disciplines in the social sciences and humanities. This paper introduces LayoutParser, an open-source library for streamlining the usage of DL in DIA research and applications. The core LayoutParser library comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks. To promote extensibility, LayoutParser also incorporates a community platform for sharing both pre-trained models and full document digitization pipelines. We demonstrate that LayoutParser is helpful for both lightweight and large-scale digitization pipelines in realword use cases. The library is publicly available at https://layout-parser.github.io.

引用

页码：131 / 146

页数：16

共 50 条

[1] Design and Development of Image Recognition Toolkit Based on Deep Learning
Zhao, Hui
Zhang, Hai-Xia
Cao, Qing-Jiao
Sun, Sheng-Juan
Han, Xuanzhe
Palaoag, Thelma D.
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (01)
[2] A Toolkit for Analysis of Deep Learning Experiments
O'Donoghue, Jim
Roantree, Mark
[J]. ADVANCES IN INTELLIGENT DATA ANALYSIS XV, 2016, 9897 : 134 - 145
[3] Deep learning for terahertz image denoising in nondestructive historical document analysis
Dutta, Balaka
Root, Konstantin
Ullmann, Ingrid
Wagner, Fabian
Mayr, Martin
Seuret, Mathias
Thies, Mareike
Stromer, Daniel
Christlein, Vincent
Schuer, Jan
Maier, Andreas
Huang, Yixing
[J]. SCIENTIFIC REPORTS, 2022, 12 (01):
[4] Deep learning for terahertz image denoising in nondestructive historical document analysis
Balaka Dutta
Konstantin Root
Ingrid Ullmann
Fabian Wagner
Martin Mayr
Mathias Seuret
Mareike Thies
Daniel Stromer
Vincent Christlein
Jan Schür
Andreas Maier
Yixing Huang
[J]. Scientific Reports, 12 (1)
[5] A machine learning toolkit for CRISM image analysis
Plebani, Emanuele
Ehlmann, Bethany L.
Leask, Ellen K.
Fox, Valerie K.
Dundar, M. Murat
[J]. ICARUS, 2022, 376
[6] MEDICAL IMAGE ANALYSIS BASED ON DEEP LEARNING
Dong, S.
Wang, P.
[J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 122 : 66 - 66
[7] Document Image Dewarping using Deep Learning
Ramanna, Vijaya
Bukhari, Saqib
Dengel, Andreas
[J]. ICPRAM: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2019, : 524 - 531
[8] A Unified Deep Learning Framework for ssTEM Image Restoration
Deng, Shiyu
Huang, Wei
Chen, Chang
Fu, Xueyang
Xiong, Zhiwei
[J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (12) : 3734 - 3746
[9] Deep Learning Based Language and Orientation Recognition in Document Analysis
Chen, Li
Wang, Song
Fan, Wei
Sun, Jun
Satoshi, Naoi
[J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 436 - 440
[10] DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis
Vogtlin, Lars
Scius-Bertrand, Anna
Maergner, Paul
Fischer, Andreas
Ingold, Rolf
[J]. PROCEEDINGS OF THE 2023 INTERNATIONAL WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING, HIP 2023, 2023, : 61 - 66

← 1 2 3 4 5 →