Multi-word terms selection for information retrieval

被引:1
|
作者
Bechikh Ali, Chedi [1 ]
Haddad, Hatem [2 ]
Slimani, Yahya [3 ]
机构
[1] Univ Carthage, Inst Natl Sci Appliquees & Technol INSAT, LISI, Tunis, Tunisia
[2] ICompass, Tunis, Tunisia
[3] Univ Manouba, Inst Super Arts Multimedia ISAMM, Manouba, Tunisia
关键词
Performance measurement; Statistics; Information systems; Information retrieval; Information science; Collection management; Indexing; Multi-word terms; Association measure; Precision; BILINGUAL LEXICONS;
D O I
10.1108/IDD-12-2021-0142
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Purpose A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive beyond simple terms indexing to propose a framework for multi-word terms (MWT) filtering and indexing. Design/methodology/approach In this paper, the authors rely on ranking MWT to filter them, keeping the most effective ones for the indexing process. The proposed model is based on filtering MWT according to their ability to capture the document topic and distinguish between different documents from the same collection. The authors rely on the hypothesis that the best MWT are those that achieve the greatest association degree. The experiments are carried out with English and French languages data sets. Findings The results indicate that this approach achieved precision enhancements at low recall, and it performed better than more advanced models based on terms dependencies. Originality/value Using and testing different association measures to select MWT that best describe the documents to enhance the precision in the first retrieved documents.
引用
收藏
页码:74 / 87
页数:14
相关论文
共 50 条
  • [1] Should one use term proximity or multi-word terms for Arabic information retrieval?
    El Mahdaouy, Abdelkader
    Gaussier, Eric
    El Alaoui, Said Ouatik
    [J]. COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 76 - 97
  • [2] The role of multi-word units in interactive information retrieval
    Vechtomova, O
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2005, 3408 : 403 - 420
  • [3] Expansion of multi-word terms for indexing and retrieval using morphology and syntax
    Jacquemin, C
    Klavans, JL
    Tzoukermann, E
    [J]. 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, : 24 - 31
  • [4] TFIDF, LSI and Multi-word in Information Retrieval and Text Categorization
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 108 - +
  • [5] On the Structural Disambiguation of Multi-word Terms
    Cabezas-Garcia, Melania
    Leon-Arauz, Pilar
    [J]. COMPUTATIONAL AND CORPUS-BASED PHRASEOLOGY, EUROPHRAS 2019, 2019, 11755 : 46 - 60
  • [6] Compositionality and lexical alignment of multi-word terms
    Emmanuel Morin
    Béatrice Daille
    [J]. Language Resources and Evaluation, 2010, 44 : 79 - 95
  • [7] Lexical selection in multi-word production
    Janssen, Niels
    Caramazza, Alfonso
    [J]. FRONTIERS IN PSYCHOLOGY, 2011, 2
  • [8] Compositionality and lexical alignment of multi-word terms
    Morin, Emmanuel
    Daille, Beatrice
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2010, 44 (1-2) : 79 - 95
  • [9] Word Embedding Approach for Synonym Extraction of Multi-Word Terms
    Hazem, Amir
    Daille, Beatrice
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 297 - 303
  • [10] Semantic prosody and semantic preference in multi-word terms
    Cabezas-Garcia, Melania
    Faber, Pamela
    [J]. FACHSPRACHE-JOURNAL OF PROFESSIONAL AND SCIENTIFIC COMMUNICATION, 2019, 41 (1-2): : 2 - 21