Multi-modal page stream segmentation with convolutional neural networks

被引:14
|
作者
Wiedemann, Gregor [1 ]
Heyer, Gerhard [2 ]
机构
[1] Hamburg Univ, Dept Comp Sci, Vogt Kolln Str 30, D-22527 Hamburg, Germany
[2] Univ Leipzig, Dept Comp Sci, Augustuspl 9, D-04109 Leipzig, Germany
关键词
Page stream segmentation; Document flow segmentation; Convolutional neural nets; Text classification; Digital mailroom; CLASSIFICATION;
D O I
10.1007/s10579-019-09476-2
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As first steps, the workflow usually involves batch scanning and optical character recognition (OCR) of documents. In the case of multi-page documents, the preservation of document contexts is a major requirement. To facilitate workflows involving very large amounts of paper scans, page stream segmentation (PSS) is the task to automatically separate a stream of scanned images into coherent multi-page documents. In a digitization project together with a German federal archive, we developed a novel approach for PSS based on convolutional neural networks (CNN). As a first project, we combine visual information from scanned images with semantic information from OCR-ed texts for this task. The multi-modal combination of features in a single classification architecture allows for major improvements towards optimal document separation. Further to multimodality, our PSS approach profits from transfer-learning and sequential page modeling. We achieve accuracy up to 95% on multi-page documents on our in-house dataset and up to 93% on a publicly available dataset.
引用
收藏
页码:127 / 150
页数:24
相关论文
共 50 条
  • [1] Multi-modal page stream segmentation with convolutional neural networks
    Gregor Wiedemann
    Gerhard Heyer
    Language Resources and Evaluation, 2021, 55 : 127 - 150
  • [2] Multi-modal Brain Tumor Segmentation Utilizing Convolutional Neural Networks
    Jakab, Marek
    Stevuliak, Marek
    Benesova, Wanda
    TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
  • [3] Multi-modal MRI segmentation of sarcoma tumors using convolutional neural networks
    Holbrook, M.
    Blocker, S. J.
    Mowery, Y. M.
    Badea, C. T.
    MEDICAL IMAGING 2019: PHYSICS OF MEDICAL IMAGING, 2019, 10948
  • [4] Multi-Modal Convolutional Neural Networks for Activity Recognition
    Ha, Sojeong
    Yun, Jeong-Min
    Choi, Seungjin
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 3017 - 3022
  • [5] Efficient Segmentation of Multi-modal Optoacoustic and Ultrasound Images Using Convolutional Neural Networks
    Lafci, Berkan
    Mercep, Elena
    Morscher, Stefan
    Dean-Ben, Xose Luis
    Razansky, Daniel
    PHOTONS PLUS ULTRASOUND: IMAGING AND SENSING 2020, 2020, 11240
  • [6] Multi-modal Information Extraction and Fusion with Convolutional Neural Networks
    Kumar, Dinesh
    Sharma, Dharmendra
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [7] Learning Confidence Measures by Multi-modal Convolutional Neural Networks
    Fu, Zehua
    Ardabilian Fard, Mohsen
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1321 - 1330
  • [8] Multi-Modal Reflection Removal Using Convolutional Neural Networks
    Sun, Jun
    Chang, Yakun
    Jung, Cheolkon
    Feng, Jiawei
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (07) : 1011 - 1015
  • [9] Multi-Modal Depth Estimation Using Convolutional Neural Networks
    Siddiqui, Sadique Adnan
    Vierling, Axel
    Berns, Karsten
    2020 IEEE INTERNATIONAL SYMPOSIUM ON SAFETY, SECURITY, AND RESCUE ROBOTICS (SSRR 2020), 2020, : 354 - 359
  • [10] Splenomegaly Segmentation on Multi-Modal MRI Using Deep Convolutional Networks
    Huo, Yuankai
    Xu, Zhoubing
    Bao, Shunxing
    Bermudez, Camilo
    Moon, Hyeonsoo
    Parvathaneni, Prasanna
    Moyo, Tamara K.
    Savona, Michael R.
    Assad, Albert
    Abramson, Richard G.
    Landman, Bennett A.
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (05) : 1185 - 1196