Multi-modal page stream segmentation with convolutional neural networks

被引:14
|
作者
Wiedemann, Gregor [1 ]
Heyer, Gerhard [2 ]
机构
[1] Hamburg Univ, Dept Comp Sci, Vogt Kolln Str 30, D-22527 Hamburg, Germany
[2] Univ Leipzig, Dept Comp Sci, Augustuspl 9, D-04109 Leipzig, Germany
关键词
Page stream segmentation; Document flow segmentation; Convolutional neural nets; Text classification; Digital mailroom; CLASSIFICATION;
D O I
10.1007/s10579-019-09476-2
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As first steps, the workflow usually involves batch scanning and optical character recognition (OCR) of documents. In the case of multi-page documents, the preservation of document contexts is a major requirement. To facilitate workflows involving very large amounts of paper scans, page stream segmentation (PSS) is the task to automatically separate a stream of scanned images into coherent multi-page documents. In a digitization project together with a German federal archive, we developed a novel approach for PSS based on convolutional neural networks (CNN). As a first project, we combine visual information from scanned images with semantic information from OCR-ed texts for this task. The multi-modal combination of features in a single classification architecture allows for major improvements towards optimal document separation. Further to multimodality, our PSS approach profits from transfer-learning and sequential page modeling. We achieve accuracy up to 95% on multi-page documents on our in-house dataset and up to 93% on a publicly available dataset.
引用
收藏
页码:127 / 150
页数:24
相关论文
共 50 条
  • [31] A Multi-Modal Approach to Digital Document Stream Segmentation for Title Insurance Domain
    Guha, Abhijit
    Alahmadi, Abdulrahman
    Samanta, Debabrata
    Khan, Mohammad Zubair
    Alahmadi, Ahmed H.
    IEEE ACCESS, 2022, 10 : 11341 - 11353
  • [32] Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion
    Deng, Xin
    Dragotti, Pier Luigi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3333 - 3348
  • [33] Multi-modal transcriptomics: integrating machine learning and convolutional neural networks to identify immune biomarkers in atherosclerosis
    Chen, Haiqing
    Lai, Haotian
    Chi, Hao
    Fan, Wei
    Huang, Jinbang
    Zhang, Shengke
    Jiang, Chenglu
    Jiang, Lai
    Hu, Qingwen
    Yan, Xiuben
    Chen, Yemeng
    Zhang, Jieying
    Yang, Guanhu
    Liao, Bin
    Wan, Juyi
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2024, 11
  • [34] Cloud detection algorithm for multi-modal satellite imagery using convolutional neural-networks (CNN)
    Segal-Rozenhaimer, Michal
    Li, Alan
    Das, Kamalika
    Chirayath, Ved
    REMOTE SENSING OF ENVIRONMENT, 2020, 237 (237)
  • [35] An Ensemble Learning Approach for Multi-Modal Medical Image Fusion using Deep Convolutional Neural Networks
    Maseleno, Andino
    Kavitha, D.
    Ashok, Koudegai
    Ansari, Mohammed Saleh Al
    Satheesh, Nimmati
    Reddy, R. Vijaya Kumar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (08) : 758 - 769
  • [36] Multi-Modal Convolutional Dictionary Learning
    Gao, Fangyuan
    Deng, Xin
    Xu, Mai
    Xu, Jingyi
    Dragotti, Pier Luigi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1325 - 1339
  • [37] A Novel Approach to Enhancing Multi-Modal Facial Recognition: Integrating Convolutional Neural Networks, Principal Component Analysis, and Sequential Neural Networks
    Abdul-Al, Mohamed
    Kyeremeh, George Kumi
    Qahwaji, Rami
    Ali, Nazar T.
    Abd-Alhameed, Raed A.
    IEEE ACCESS, 2024, 12 : 140823 - 140846
  • [38] Multi-modal image segmentation using a modified Hopfield neural network
    Rout, S
    Seethalakshmy
    Srivastava, P
    Majumdar, J
    PATTERN RECOGNITION, 1998, 31 (06) : 743 - 750
  • [39] Multi-modal image segmentation using a modified Hopfield neural network
    BITS, Pilani, India
    Pattern Recognit, 6 (743-750):
  • [40] Multi-modal Recurrent Graph Neural Networks for Spatiotemporal Forecasting
    Majeske, Nicholas
    Azad, Ariful
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PAKDD 2024, 2024, 14646 : 144 - 157