Automatic indexing of health documents in French: Evaluating and analysing errors

被引:5
|
作者
Chebil, W. [1 ,2 ]
Soualmia, L. F. [2 ]
Dahamna, B. [2 ]
Darmoni, S. J. [2 ]
机构
[1] Univ Monastir, Unite Rech MARS, Monastir, Tunisia
[2] CHU Rouen, LITIS TIBS EA 4108, Equipe CISMeF, F-76031 Rouen, France
关键词
TEXT; MESH;
D O I
10.1016/j.irbm.2012.10.002
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Catalogue and Index of French Medical Sites (CISMeF) is developed for retrieving the relevant medical information in the Internet for health professionals, the patients and students in medicine. The gathered resources are manually indexed, semi-automatically indexed or automatically indexed. Actually, the function indexing of CISMeF indexes only a part of resources that are judged the less important. Objectives. - The objective of this work is to evaluate the indexing function developed for CISMeF, and analyse generated errors. Material and method. - We used 500 clinical guidelines for the evaluation-of the indexing function, based since his implementation, on the "bag of words" algorithm. The automatic index generated is compared with the manual one which is considered as the "gold standard". We analyze the automatic indexing of short titles and subtitles associated, the automatic indexing of long titles and subtitles associated, the automatic indexing of long and short titles and subtitles associated and the automatic indexing of abstracts. The measures used for the evaluation are Precision, Recall and F-measure. Results. - The results of the evaluation of the short titles and subtitles indexing are 0.56 for the precision, 0.21 for the recall. For the long titles and subtitles the precision is 0.39, the recall is 0.27. The precision of abstracts indexing is 0.23 and the recall is 0.61. Thirteen categories of errors are identified by analysing the indexing function. The short titles and subtitles indexing generated the less errors leading to the presence of wrong descriptors (0.97 errors per short tiles and subtitles). The long titles and subtitles generated the most errors leading to the absence of relevant descriptors (2.52 errors by long titles and subtitles). Conclusion. - The evaluation of the indexing function showed that it should be used only for short titles and subtitles. We aim, after the identification of the causes of errors, to improve the performance of the automatic indexing function which will allow indexing more medical documents. (C) 2012 Elsevier Masson SAS. All rights reserved.
引用
收藏
页码:316 / 329
页数:14
相关论文
共 50 条
  • [1] Transformer-Based Models for the Automatic Indexing of Scientific Documents in French
    Angel Gonzalez, Jose
    Buscaldi, Davide
    Sanchis, Emilio
    Hurtado, Lluis-F
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 60 - 72
  • [2] AUTOMATIC INDEXING OF DOCUMENTS AND REQUESTS
    BELONOGOV, GG
    SHEMAKIN, YI
    NOVOSELOV, AP
    CHIRKIN, VA
    RYBAKOV, BP
    [J]. NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 1-ORGANIZATSIYA I METODIKA INFORMATSIONNOI RABOTY, 1973, (07): : 17 - 25
  • [3] COMPLEX METHOD OF AUTOMATIC INDEXING OF DOCUMENTS
    RUBLEV, YV
    VOSTROV, GN
    [J]. NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1973, (04): : 8 - 14
  • [4] RESEARCH IN AUTOMATIC INDEXING OF SCIENTIFIC DOCUMENTS
    GARDIN, JC
    [J]. REVUE FRANCAISE D INFORMATIQUE DE RECHERCHE OPERATIONNELLE, 1967, 1 (06): : 27 - &
  • [5] Documents automatic indexing in an environmental domain
    Bordoni, L
    Pazienza, MT
    [J]. INTERNATIONAL FORUM ON INFORMATION AND DOCUMENTATION, 1997, 22 (01): : 17 - 28
  • [6] Automatic Subject indexing of Chinese documents
    Zhang, SL
    He, Q
    Zheng, Z
    Shi, ZZ
    [J]. Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 256 - 261
  • [7] Automatic indexing of online health resources for a French quality controlled gateway
    Névéol, A
    Rogozan, A
    Darmoni, S
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (03) : 695 - 709
  • [8] Towards Automatic Structuring and Semantic Indexing of Legal Documents
    Koniaris, Marios
    Papastefanatos, George
    Vassiliou, Yannis
    [J]. 20TH PAN-HELLENIC CONFERENCE ON INFORMATICS (PCI 2016), 2016,
  • [9] Semi-automatic indexing of documents with a multilingual thesaurus
    Schiel, U
    de Sousa, LMSF
    [J]. RIDE - MLIM 2003: THIRTEENTH INTERNATIONAL WORK SHOP ON RESEARCH ISSUES IN DATA ENGINEERING: MULTI-LINGUAL INFORMATION MANAGEMENT, PROCEEDINGS, 2003, : 31 - 38
  • [10] AUTOMATIC INDEXING SYSTEM FOR GERMAN-LANGUAGE DOCUMENTS
    ARZUMANOVA, IB
    PEVZNER, BR
    KHOKHLOVA, LA
    [J]. NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1975, (06): : 21 - 23