Multi-modal page stream segmentation with convolutional neural networks

被引：14

作者：

Wiedemann, Gregor ^{[1
]}

Heyer, Gerhard ^{[2
]}

机构：

[1] Hamburg Univ, Dept Comp Sci, Vogt Kolln Str 30, D-22527 Hamburg, Germany

[2] Univ Leipzig, Dept Comp Sci, Augustuspl 9, D-04109 Leipzig, Germany

来源：

LANGUAGE RESOURCES AND EVALUATION | 2021年 / 55卷 / 01期

关键词：

Page stream segmentation; Document flow segmentation; Convolutional neural nets; Text classification; Digital mailroom; CLASSIFICATION;

D O I：

10.1007/s10579-019-09476-2

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As first steps, the workflow usually involves batch scanning and optical character recognition (OCR) of documents. In the case of multi-page documents, the preservation of document contexts is a major requirement. To facilitate workflows involving very large amounts of paper scans, page stream segmentation (PSS) is the task to automatically separate a stream of scanned images into coherent multi-page documents. In a digitization project together with a German federal archive, we developed a novel approach for PSS based on convolutional neural networks (CNN). As a first project, we combine visual information from scanned images with semantic information from OCR-ed texts for this task. The multi-modal combination of features in a single classification architecture allows for major improvements towards optimal document separation. Further to multimodality, our PSS approach profits from transfer-learning and sequential page modeling. We achieve accuracy up to 95% on multi-page documents on our in-house dataset and up to 93% on a publicly available dataset.

引用

页码：127 / 150

页数：24

共 50 条

[31] A Multi-Modal Approach to Digital Document Stream Segmentation for Title Insurance Domain
Guha, Abhijit
Alahmadi, Abdulrahman
Samanta, Debabrata
Khan, Mohammad Zubair
Alahmadi, Ahmed H.
IEEE ACCESS, 2022, 10 : 11341 - 11353
[32] Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion
Deng, Xin
Dragotti, Pier Luigi
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3333 - 3348
[33] Multi-modal transcriptomics: integrating machine learning and convolutional neural networks to identify immune biomarkers in atherosclerosis
Chen, Haiqing
Lai, Haotian
Chi, Hao
Fan, Wei
Huang, Jinbang
Zhang, Shengke
Jiang, Chenglu
Jiang, Lai
Hu, Qingwen
Yan, Xiuben
Chen, Yemeng
Zhang, Jieying
Yang, Guanhu
Liao, Bin
Wan, Juyi
FRONTIERS IN CARDIOVASCULAR MEDICINE, 2024, 11
[34] Cloud detection algorithm for multi-modal satellite imagery using convolutional neural-networks (CNN)
Segal-Rozenhaimer, Michal
Li, Alan
Das, Kamalika
Chirayath, Ved
REMOTE SENSING OF ENVIRONMENT, 2020, 237 (237)
[35] An Ensemble Learning Approach for Multi-Modal Medical Image Fusion using Deep Convolutional Neural Networks
Maseleno, Andino
Kavitha, D.
Ashok, Koudegai
Ansari, Mohammed Saleh Al
Satheesh, Nimmati
Reddy, R. Vijaya Kumar
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (08) : 758 - 769
[36] Multi-Modal Convolutional Dictionary Learning
Gao, Fangyuan
Deng, Xin
Xu, Mai
Xu, Jingyi
Dragotti, Pier Luigi
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1325 - 1339
[37] A Novel Approach to Enhancing Multi-Modal Facial Recognition: Integrating Convolutional Neural Networks, Principal Component Analysis, and Sequential Neural Networks
Abdul-Al, Mohamed
Kyeremeh, George Kumi
Qahwaji, Rami
Ali, Nazar T.
Abd-Alhameed, Raed A.
IEEE ACCESS, 2024, 12 : 140823 - 140846
[38] Multi-modal image segmentation using a modified Hopfield neural network
Rout, S
Seethalakshmy
Srivastava, P
Majumdar, J
PATTERN RECOGNITION, 1998, 31 (06) : 743 - 750
[39] Multi-modal image segmentation using a modified Hopfield neural network
BITS, Pilani, India
Pattern Recognit, 6 (743-750):
[40] Multi-modal Recurrent Graph Neural Networks for Spatiotemporal Forecasting
Majeske, Nicholas
Azad, Ariful
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PAKDD 2024, 2024, 14646 : 144 - 157

← 1 2 3 4 5 →