Multi-modal page stream segmentation with convolutional neural networks

被引：14

作者：

Wiedemann, Gregor ^{[1
]}

Heyer, Gerhard ^{[2
]}

机构：

[1] Hamburg Univ, Dept Comp Sci, Vogt Kolln Str 30, D-22527 Hamburg, Germany

[2] Univ Leipzig, Dept Comp Sci, Augustuspl 9, D-04109 Leipzig, Germany

来源：

LANGUAGE RESOURCES AND EVALUATION | 2021年 / 55卷 / 01期

关键词：

Page stream segmentation; Document flow segmentation; Convolutional neural nets; Text classification; Digital mailroom; CLASSIFICATION;

D O I：

10.1007/s10579-019-09476-2

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As first steps, the workflow usually involves batch scanning and optical character recognition (OCR) of documents. In the case of multi-page documents, the preservation of document contexts is a major requirement. To facilitate workflows involving very large amounts of paper scans, page stream segmentation (PSS) is the task to automatically separate a stream of scanned images into coherent multi-page documents. In a digitization project together with a German federal archive, we developed a novel approach for PSS based on convolutional neural networks (CNN). As a first project, we combine visual information from scanned images with semantic information from OCR-ed texts for this task. The multi-modal combination of features in a single classification architecture allows for major improvements towards optimal document separation. Further to multimodality, our PSS approach profits from transfer-learning and sequential page modeling. We achieve accuracy up to 95% on multi-page documents on our in-house dataset and up to 93% on a publicly available dataset.

引用

页码：127 / 150

页数：24

共 50 条

[41] Prediction of protein secondary structure by multi-modal neural networks
Zhu, HX
Yoshihara, I
Yamamori, K
PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 280 - 285
[42] Multi-modal Neural Networks for symbolic sequence pattern classification
Zhu, HX
Yoshihara, I
Yamamori, K
Yasunaga, M
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (07) : 1943 - 1952
[43] Speech recognition with multi-modal features based on neural networks
Kim, Myung Won
Ryu, Joung Woo
Kim, Eun Ju
NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 489 - 498
[44] Prediction of protein secondary structure by multi-modal neural networks
Zhu, HX
Yoshihara, I
Yamamori, K
Yasunaga, M
RECENT ADVANCES IN SIMULATED EVOLUTION AND LEARNING, 2004, 2 : 682 - 697
[45] Reinforced multi-modal cyberbullying detection with subgraph neural networks
Luo, Kai
Zheng, Ce
Guan, Zhenyu
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (03) : 2161 - 2180
[46] Learning multi-modal recurrent neural networks with target propagation
Manchev, Nikolay
Spratling, Michael
COMPUTATIONAL INTELLIGENCE, 2024, 40 (04)
[47] Multi-modal Complete Breast Segmentation
Zolfagharnasab, Hooshiar
Monteiro, Joao P.
Teixeira, Joao F.
Borlinhas, Filipa
Oliveira, Helder P.
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 519 - 527
[48] Multi-modal semantic image segmentation
Pemasiri, Akila
Kien Nguyen
Sridharan, Sridha
Fookes, Clinton
COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 202
[49] Referring Image Segmentation with Multi-Modal Feature Interaction and Alignment Based on Convolutional Nonlinear Spiking Neural Membrane Systems
Sun, Siyan
Wang, Peng
Peng, Hong
Liu, Zhicai
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2024, 34 (12)
[50] Utilizing Deep Convolutional Neural Networks and Non-Negative Matrix Factorization for Multi-Modal Image Fusion
Das, Nripendra Narayan
Govindasamy, Santhakumar
Godla, Sanjiv Rao
El-Ebiary, Yousef A. Baker
Thenmozhi, E.
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 597 - 606

← 1 2 3 4 5 →