Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies

被引:9
|
作者
Duy Dinh [1 ]
Tamine, Lynda [1 ]
Boubekeur, Fatiha [2 ]
机构
[1] Univ Toulouse 3, Inst Rech Informat Toulouse, F-31062 Toulouse, France
[2] Mouloud Mammeri Univ, Dept Comp Sci, Tizi Ouzou 15000, Algeria
关键词
Multi-terminology indexing; Voting techniques; Document/query expansion; Concept extraction; Biomedical retrieval; QUERY EXPANSION; INFORMATION-RETRIEVAL; TEXT; DICTIONARY; GENE;
D O I
10.1016/j.artmed.2012.08.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: The aim of this work is to evaluate a set of indexing and retrieval strategies based on the integration of several biomedical terminologies on the available TREC Genomics collections for an ad hoc information retrieval (IR) task. Materials and methods: We propose a multi-terminology based concept extraction approach to selecting best concepts from free text by means of voting techniques. We instantiate this general approach on four terminologies (MeSH, SNOMED, ICD-10 and GO). We particularly focus on the effect of integrating terminologies into a biomedical IR process, and the utility of using voting techniques for combining the extracted concepts from each document in order to provide a list of unique concepts. Results: Experimental studies conducted on the TREC Genomics collections show that our multi-terminology IR approach based on voting techniques are statistically significant compared to the baseline. For example, tested on the 2005 TREC Genomics collection, our multi-terminology based IR approach provides an improvement rate of +6.98% in terms of MAP (mean average precision) (p<0.05) compared to the baseline. In addition, our experimental results show that document expansion using preferred terms in combination with query expansion using terms from top ranked expanded documents improve the biomedical IR effectiveness. Conclusion: We have evaluated several voting models for combining concepts issued from multiple terminologies. Through this study, we presented many factors affecting the effectiveness of biomedical IR system including term weighting, query expansion, and document expansion models. The appropriate combination of those factors could be useful to improve the IR performance. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:155 / 167
页数:13
相关论文
共 50 条
  • [31] Graph-based Similarity for Document Retrieval in the Biomedical Domain
    Zuluaga, Adelaida A.
    Rosso, Andres A.
    PROCEEDINGS OF 2022 7TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2022, 2022, : 180 - 184
  • [32] Indexing Wood Image for Retrieval based on Kansei factors
    Fu Yali
    Cao Kui
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 1099 - 1102
  • [33] MeSH-Based Semantic Weighting Scheme to Enhance Document Indexing: Application on Biomedical Document Classification
    Gabsi, Imen
    Kammoun, Hager
    Souidi, Dalila
    Amous, Ikram
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2024,
  • [34] MeSH-Based Semantic Indexing Approach to Enhance Biomedical Information Retrieval
    Kammoun, Hager
    Gabsi, Imen
    Amous, Ikram
    COMPUTER JOURNAL, 2022, 65 (03): : 516 - 536
  • [35] Information Retrieval Approach based on Indexing Text Documents: Application to Biomedical Domain
    Boukhari, Kabil
    Omri, Mohamed Nazih
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 2213 - 2220
  • [36] MeSHup: A Corpus for Full Text Biomedical Document Indexing
    Wang, Xindi
    Mercer, Robert E.
    Rudzicz, Frank
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5473 - 5483
  • [37] A FUZZY DOCUMENT-RETRIEVAL METHOD BASED ON 2-VALUED INDEXING
    MURAI, T
    MIYAKOSHI, M
    SHIMBO, M
    FUZZY SETS AND SYSTEMS, 1989, 30 (02) : 103 - 120
  • [38] OPTIMUM DEPTH OF INDEXING IN DESCRIPTOR (DOCUMENT-BASED) INFORMTION RETRIEVAL SYSTEMS
    SAGALOVICH, NM
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1971, (03): : 19 - +
  • [39] Morpheme-based, cross-lingual indexing for medical document retrieval
    Schulz, S
    Hahn, U
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2000, 58 : 87 - 99
  • [40] Using Latent Semantic Indexing for Morph-based Spoken Document Retrieval
    Turunen, Ville T.
    Kurimo, Mikko
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 341 - 344