Multi-modal page stream segmentation with convolutional neural networks

被引:14
|
作者
Wiedemann, Gregor [1 ]
Heyer, Gerhard [2 ]
机构
[1] Hamburg Univ, Dept Comp Sci, Vogt Kolln Str 30, D-22527 Hamburg, Germany
[2] Univ Leipzig, Dept Comp Sci, Augustuspl 9, D-04109 Leipzig, Germany
关键词
Page stream segmentation; Document flow segmentation; Convolutional neural nets; Text classification; Digital mailroom; CLASSIFICATION;
D O I
10.1007/s10579-019-09476-2
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As first steps, the workflow usually involves batch scanning and optical character recognition (OCR) of documents. In the case of multi-page documents, the preservation of document contexts is a major requirement. To facilitate workflows involving very large amounts of paper scans, page stream segmentation (PSS) is the task to automatically separate a stream of scanned images into coherent multi-page documents. In a digitization project together with a German federal archive, we developed a novel approach for PSS based on convolutional neural networks (CNN). As a first project, we combine visual information from scanned images with semantic information from OCR-ed texts for this task. The multi-modal combination of features in a single classification architecture allows for major improvements towards optimal document separation. Further to multimodality, our PSS approach profits from transfer-learning and sequential page modeling. We achieve accuracy up to 95% on multi-page documents on our in-house dataset and up to 93% on a publicly available dataset.
引用
收藏
页码:127 / 150
页数:24
相关论文
共 50 条
  • [21] Multi-Modal Segmentation of 3D Brain Scans Using Neural Networks
    Zopes, Jonathan
    Platscher, Moritz
    Paganucci, Silvio
    Federau, Christian
    FRONTIERS IN NEUROLOGY, 2021, 12
  • [22] MEDICAL IMAGE SEGMENTATION BASED ON MULTI-MODAL CONVOLUTIONAL NEURAL NETWORK: STUDY ON IMAGE FUSION SCHEMES
    Guo, Zhe
    Li, Xiang
    Huang, Heng
    Guo, Ning
    Li, Quanzheng
    2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), 2018, : 903 - 907
  • [23] Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features
    Wiedemann, Gregor
    Heyer, Gerhard
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3675 - 3680
  • [24] Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation
    Lyu, Y.
    Schiopu, I.
    Munteanu, A.
    ELECTRONICS LETTERS, 2020, 56 (18) : 920 - 922
  • [25] Flow invariance for competitive multi-modal neural networks
    Meyer-Bäse, A
    Pilyugin, SS
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 3101 - 3105
  • [26] Measuring Modality Utilization in Multi-Modal Neural Networks
    Singh, Saurav
    Markopoulos, Panos P.
    Saber, Eli
    Lew, Jesse D.
    Heard, Jamison
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 11 - 14
  • [27] Segmentation of glioblastomas in early post-operative multi-modal MRI with deep neural networks
    Ragnhild Holden Helland
    Alexandros Ferles
    André Pedersen
    Ivar Kommers
    Hilko Ardon
    Frederik Barkhof
    Lorenzo Bello
    Mitchel S. Berger
    Tora Dunås
    Marco Conti Nibali
    Julia Furtner
    Shawn Hervey-Jumper
    Albert J. S. Idema
    Barbara Kiesel
    Rishi Nandoe Tewari
    Emmanuel Mandonnet
    Domenique M. J. Müller
    Pierre A. Robe
    Marco Rossi
    Lisa M. Sagberg
    Tommaso Sciortino
    Tom Aalders
    Michiel Wagemakers
    Georg Widhalm
    Marnix G. Witte
    Aeilko H. Zwinderman
    Paulina L. Majewska
    Asgeir S. Jakola
    Ole Solheim
    Philip C. De Witt Hamer
    Ingerid Reinertsen
    Roelant S. Eijgelaar
    David Bouget
    Scientific Reports, 13
  • [28] Segmentation of glioblastomas in early post-operative multi-modal MRI with deep neural networks
    Helland, Ragnhild Holden
    Ferles, Alexandros
    Pedersen, Andre
    Kommers, Ivar
    Ardon, Hilko
    Barkhof, Frederik
    Bello, Lorenzo
    Berger, Mitchel S.
    Dunas, Tora
    Nibali, Marco Conti
    Furtner, Julia
    Hervey-Jumper, Shawn
    Idema, Albert J. S.
    Kiesel, Barbara
    Tewari, Rishi Nandoe
    Mandonnet, Emmanuel
    Mueller, Domenique M. J.
    Robe, Pierre A.
    Rossi, Marco
    Sagberg, Lisa M.
    Sciortino, Tommaso
    Aalders, Tom
    Wagemakers, Michiel
    Widhalm, Georg
    Witte, Marnix G.
    Zwinderman, Aeilko H.
    Majewska, Paulina L.
    Jakola, Asgeir S.
    Solheim, Ole
    Hamer, Philip C. De Witt
    Reinertsen, Ingerid
    Eijgelaar, Roelant S.
    Bouget, David
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [29] Fully Convolutional Neural Networks for Page Segmentation of Historical Document Images
    Wick, Christoph
    Puppe, Frank
    2018 13TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS), 2018, : 287 - 292
  • [30] Cross-Modal Attention-Guided Convolutional Network for Multi-modal Cardiac Segmentation
    Zhou, Ziqi
    Guo, Xinna
    Yang, Wanqi
    Shi, Yinghuan
    Zhou, Luping
    Wang, Lei
    Yang, Ming
    MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2019), 2019, 11861 : 601 - 610