Method of Lexical Enrichment in Information Retrieval System in Arabic

被引:9
|
作者
Mallat, Souheyl [1 ]
Zouaghi, Anis [2 ]
Hkiri, Emna [1 ]
Zrigui, Mounir [1 ]
机构
[1] Univ Monastir, Dept Comp Sci, Monastir, Tunisia
[2] Sousse Univ, Dept Comp Sci, Higher Inst Appl Sci & Technol Sousse, Sousse, Tunisia
关键词
Arabic NL; Information Retrieval; Lexical Enrichment; Query Enrichment; Weighting;
D O I
10.4018/ijirr.2013100103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, the authors propose a method for lexical enrichment of Arabic queries in order to improve the performance of the information retrieval systems SRI. This method has two types of enrichment: linguistic and contextual. The first one is based on the linguistic analysis (lemmatization, morphological, syntactic and semantic analysis), whose goal is to generate a descriptive list (list-desc). This list contains a set of linguistic lexicon assigned to each significant term in the query. The second enrichment consists in integrating contextual information derived from the corpus documents. It is based on statistical analysis using Salton weighting functions: TF-IDF and TF-IEF. The TF-IDF function is applied on the list-desc and documents in the corpus in order to identify relevant documents. TF-IEF function is made between the list-desc and sentences belonging to the relevant documents to identify relevant sentences. Then, terms in these sentences are weighted, and those with highest weights are considered rich in terms of informative and contextual importance are added to the original query. The authors' lexical enrichment method was evaluated on a corpus of documents belonging to a specialized domain and results show its interest in terms of precision and recall.
引用
收藏
页码:35 / 51
页数:17
相关论文
共 50 条
  • [31] Arabic Question Answering System for Information Retrieval on Large-scale Image Objects
    Al-Zubi, Sawsan
    Awaysheh, Feras M.
    Al-Shboul, Bashar Awad
    2021 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT DATA SCIENCE TECHNOLOGIES AND APPLICATIONS (IDSTA), 2021, : 162 - 170
  • [32] COMPARING WORDS, STEMS, AND ROOTS AS INDEX TERMS IN AN ARABIC INFORMATION-RETRIEVAL SYSTEM
    ALKHARASHI, IA
    EVENS, MW
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1994, 45 (08): : 548 - 560
  • [33] Binary lexical relations for text representation in information retrieval
    Gonzalez, M
    de Lima, VLS
    de Lima, JV
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2005, 3513 : 21 - 31
  • [34] LEXICAL COMPOSITION OF IRT (INFORMATION-RETRIEVAL THESAURUSES)
    POZHARISKII, IF
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1989, (08): : 15 - 24
  • [35] Query Representation through Lexical Association for Information Retrieval
    Goyal, Pawan
    Behera, Laxmidhar
    McGinnity, Thomas Martin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (12) : 2260 - 2273
  • [36] Evaluating lexical variant generation to improve information retrieval
    Divita, G
    Browne, AC
    Rindflesch, TC
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1998, : 775 - 779
  • [37] Pre-indexing Techniques in Arabic Information Retrieval
    Ben Guirat, Souheila
    Bounhas, Ibrahim
    Slimani, Yahia
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 237 - 246
  • [38] Automatic translation of Arabic queries for Bilingual information retrieval
    Mallat, Souheyl
    Zouaghi, Anis
    Hkiri, Emna
    Zrigui, Mounir
    2013 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY AND ACCESSIBILITY (ICTA), 2013,
  • [39] Challenges in Information Retrieval from Unstructured Arabic Data
    Khalil, Hussein
    Osman, Taha
    2014 UKSIM-AMSS 16TH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION (UKSIM), 2014, : 456 - 461
  • [40] Information Retrieval from Unstructured Arabic Legal Data
    Mezghanni, Imen Bouaziz
    Gargouri, Faiez
    PRICAI 2016: TRENDS IN ARTIFICIAL INTELLIGENCE, 2016, 9810 : 44 - 54