Automatic classification of company's document stream: Comparison of two solutions

被引:0
|
作者
Voerman, Joris [1 ]
Mahamoud, Ibrahim Souleiman [1 ]
Coustaty, Mickael [1 ]
Joseph, Aurelie [2 ]
d'Andecy, Vincent Poulain [2 ]
Ogier, Jean-Marc [1 ]
机构
[1] La Rochelle Univ, L3i, Ave Michel Crepeau, F-17042 La Rochelle, France
[2] Yooz, 1 Rue Fleming, F-17000 La Rochelle, France
关键词
Document processing; Imbalanced classification; Neural network;
D O I
10.1016/j.patrec.2023.06.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Documents are essential nowadays and present everywhere. In order to manage the vast amount of documents managed by companies, a first step consists in automatically determining the type of the document (its class). Even if automatic classification has been widely studied in the state of the art, the strongly imbalanced context and industrial constraints bring new challenges which were not studied till now: how to classify as many documents as possible with the highest precision, in an imbalanced context and with some classes missing during training? To this end, this paper proposes to study two different solutions to address these issues. The first is a multimodal neural network reinforced by an attention model and an adapted loss function that is able to classify a great variety of documents. The second is a combination method that uses a cascade of systems to offer a gradual solution for each issue. These two options provide good results as well in ideal context than in imbalanced context. This comparison outlines the limitations and the future challenges. & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:181 / 187
页数:7
相关论文
共 50 条
  • [41] Two Algorithms for Automatic Document Page Layout
    de Oliveira, Joao Batista S.
    [J]. DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2008, : 141 - 149
  • [42] Comparison of two automatic cell-counting solutions for fluorescent microscopic images
    Lojk, J.
    Cibej, U.
    Karlas, D.
    Sajn, L.
    Pavlin, M.
    [J]. JOURNAL OF MICROSCOPY, 2015, 260 (01) : 107 - 116
  • [43] Toward an Incremental Classification Process of Document Stream Using a Cascade of Systems
    Voerman, Joris
    Mahamoud, Ibrahim Souleiman
    Joseph, Aurelie
    Coustaty, Mickael
    d'Andecy, Vincent Poulain
    Ogier, Jean-Marc
    [J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 240 - 254
  • [44] Comparison of term weighting schemes for document classification
    Jeong, Ho Young
    Shin, Sang Min
    Choi, Yong-Seok
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (02) : 265 - 276
  • [45] BayesTH-MCRDR algorithm for automatic classification of web document
    Cho, WC
    Richards, D
    [J]. AI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3339 : 344 - 356
  • [46] CLASSIFICATION SPACE - MULTIVARIATE PROCEDURE FOR AUTOMATIC DOCUMENT INDEXING AND RETRIEVAL
    OSSORIO, PG
    [J]. MULTIVARIATE BEHAVIORAL RESEARCH, 1966, 1 (04) : 479 - 524
  • [47] USE OF TITLE AND CITED TITLES AS DOCUMENT REPRESENTATION FOR AUTOMATIC CLASSIFICATION
    KWOK, KL
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1975, 11 (8-12) : 201 - 206
  • [48] Automatic Document Classification of Digital Library via Kernel Method
    Ni, Ya-jing
    Cheng, Hui
    [J]. INTERNATIONAL CONFERENCE ON ELECTRICAL, CONTROL AND AUTOMATION ENGINEERING (ECAE 2013), 2013, : 541 - 545
  • [49] Automatic document classification and indexing in high-volume applications
    Appiani E.
    Cesarini F.
    Colla A.M.
    Diligenti M.
    Gori M.
    Marinai S.
    Soda G.
    [J]. Marinai, S. (simone@dsi.unifi.it), 2001, Springer Verlag (04) : 69 - 83
  • [50] Page segmentation and content classification for automatic document image processing
    Yip, SK
    Chi, Z
    [J]. PROCEEDINGS OF 2001 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2001, : 279 - 282