Automatic classification of company's document stream: Comparison of two solutions

被引:0
|
作者
Voerman, Joris [1 ]
Mahamoud, Ibrahim Souleiman [1 ]
Coustaty, Mickael [1 ]
Joseph, Aurelie [2 ]
d'Andecy, Vincent Poulain [2 ]
Ogier, Jean-Marc [1 ]
机构
[1] La Rochelle Univ, L3i, Ave Michel Crepeau, F-17042 La Rochelle, France
[2] Yooz, 1 Rue Fleming, F-17000 La Rochelle, France
关键词
Document processing; Imbalanced classification; Neural network;
D O I
10.1016/j.patrec.2023.06.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Documents are essential nowadays and present everywhere. In order to manage the vast amount of documents managed by companies, a first step consists in automatically determining the type of the document (its class). Even if automatic classification has been widely studied in the state of the art, the strongly imbalanced context and industrial constraints bring new challenges which were not studied till now: how to classify as many documents as possible with the highest precision, in an imbalanced context and with some classes missing during training? To this end, this paper proposes to study two different solutions to address these issues. The first is a multimodal neural network reinforced by an attention model and an adapted loss function that is able to classify a great variety of documents. The second is a combination method that uses a cascade of systems to offer a gradual solution for each issue. These two options provide good results as well in ideal context than in imbalanced context. This comparison outlines the limitations and the future challenges. & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:181 / 187
页数:7
相关论文
共 50 条
  • [1] AUTOMATIC DOCUMENT CLASSIFICATION
    BORKO, H
    BERNICK, M
    [J]. JOURNAL OF THE ACM, 1963, 10 (02) : 151 - &
  • [2] Document Classification And Automatic Grading
    Subramaniyan, G. L. Sankara
    Vishwa, S. Yajith
    Yogith, T.
    Uma, K., V
    Deisy, C.
    [J]. 2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
  • [3] A comparative study of two automatic document classification methods in a library setting
    Pong, Joanna Yi-Hang
    Kwok, Ron Chi-Wai
    Lau, Raymond Yiu-Keung
    Hao, Jin-Xing
    Wong, Percy Ching-Chi
    [J]. JOURNAL OF INFORMATION SCIENCE, 2008, 34 (02) : 213 - 230
  • [4] SIMILARITY COEFFICIENTS AND WEIGHTING FUNCTIONS FOR AUTOMATIC DOCUMENT CLASSIFICATION - AN EMPIRICAL-COMPARISON
    WILLETT, P
    [J]. INTERNATIONAL CLASSIFICATION, 1983, 10 (03): : 138 - 142
  • [5] AN EXPERIMENT IN AUTOMATIC HIERARCHICAL DOCUMENT CLASSIFICATION
    GARLAND, K
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1983, 19 (03) : 113 - 120
  • [6] THEORY OF RELEVANCE FOR AUTOMATIC DOCUMENT CLASSIFICATION
    HEAPS, HS
    [J]. INFORMATION AND CONTROL, 1973, 22 (03): : 268 - 278
  • [7] THE USE OF TITLES FOR AUTOMATIC DOCUMENT CLASSIFICATION
    HAMILL, KA
    ZAMORA, A
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1980, 31 (06): : 396 - 402
  • [8] Automatic document classification of biological literature
    Chen, David
    Muller, Hans-Michael
    Sternberg, Paul W.
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [9] Automatic document classification of biological literature
    David Chen
    Hans-Michael Müller
    Paul W Sternberg
    [J]. BMC Bioinformatics, 7
  • [10] A New Method of Automatic Text Document Classification
    Yatsko, V. A.
    [J]. AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2021, 55 (03) : 122 - 133