Multi-modal page stream segmentation with convolutional neural networks

被引:14
|
作者
Wiedemann, Gregor [1 ]
Heyer, Gerhard [2 ]
机构
[1] Hamburg Univ, Dept Comp Sci, Vogt Kolln Str 30, D-22527 Hamburg, Germany
[2] Univ Leipzig, Dept Comp Sci, Augustuspl 9, D-04109 Leipzig, Germany
关键词
Page stream segmentation; Document flow segmentation; Convolutional neural nets; Text classification; Digital mailroom; CLASSIFICATION;
D O I
10.1007/s10579-019-09476-2
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As first steps, the workflow usually involves batch scanning and optical character recognition (OCR) of documents. In the case of multi-page documents, the preservation of document contexts is a major requirement. To facilitate workflows involving very large amounts of paper scans, page stream segmentation (PSS) is the task to automatically separate a stream of scanned images into coherent multi-page documents. In a digitization project together with a German federal archive, we developed a novel approach for PSS based on convolutional neural networks (CNN). As a first project, we combine visual information from scanned images with semantic information from OCR-ed texts for this task. The multi-modal combination of features in a single classification architecture allows for major improvements towards optimal document separation. Further to multimodality, our PSS approach profits from transfer-learning and sequential page modeling. We achieve accuracy up to 95% on multi-page documents on our in-house dataset and up to 93% on a publicly available dataset.
引用
收藏
页码:127 / 150
页数:24
相关论文
共 50 条
  • [41] Prediction of protein secondary structure by multi-modal neural networks
    Zhu, HX
    Yoshihara, I
    Yamamori, K
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 280 - 285
  • [42] Multi-modal Neural Networks for symbolic sequence pattern classification
    Zhu, HX
    Yoshihara, I
    Yamamori, K
    Yasunaga, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (07) : 1943 - 1952
  • [43] Speech recognition with multi-modal features based on neural networks
    Kim, Myung Won
    Ryu, Joung Woo
    Kim, Eun Ju
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 489 - 498
  • [44] Prediction of protein secondary structure by multi-modal neural networks
    Zhu, HX
    Yoshihara, I
    Yamamori, K
    Yasunaga, M
    RECENT ADVANCES IN SIMULATED EVOLUTION AND LEARNING, 2004, 2 : 682 - 697
  • [45] Reinforced multi-modal cyberbullying detection with subgraph neural networks
    Luo, Kai
    Zheng, Ce
    Guan, Zhenyu
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (03) : 2161 - 2180
  • [46] Learning multi-modal recurrent neural networks with target propagation
    Manchev, Nikolay
    Spratling, Michael
    COMPUTATIONAL INTELLIGENCE, 2024, 40 (04)
  • [47] Multi-modal Complete Breast Segmentation
    Zolfagharnasab, Hooshiar
    Monteiro, Joao P.
    Teixeira, Joao F.
    Borlinhas, Filipa
    Oliveira, Helder P.
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 519 - 527
  • [48] Multi-modal semantic image segmentation
    Pemasiri, Akila
    Kien Nguyen
    Sridharan, Sridha
    Fookes, Clinton
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 202
  • [49] Referring Image Segmentation with Multi-Modal Feature Interaction and Alignment Based on Convolutional Nonlinear Spiking Neural Membrane Systems
    Sun, Siyan
    Wang, Peng
    Peng, Hong
    Liu, Zhicai
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2024, 34 (12)
  • [50] Utilizing Deep Convolutional Neural Networks and Non-Negative Matrix Factorization for Multi-Modal Image Fusion
    Das, Nripendra Narayan
    Govindasamy, Santhakumar
    Godla, Sanjiv Rao
    El-Ebiary, Yousef A. Baker
    Thenmozhi, E.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 597 - 606