Semantically Enhanced Term Frequency based on Word Embeddings for Arabic Information Retrieval

被引:0
|
作者
El Mahdaouy, Abdelkader [1 ,2 ]
El Alaoui, Said Ouatik [1 ]
Gaussier, Eric [2 ]
机构
[1] Univ USMBA, FSDM, LIM, Fes, Morocco
[2] Univ Grenoble Alpes, CNRS, LIG, AMA, Grenoble, France
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional Information Retrieval (IR) models are based on bag-of-words paradigm, where relevance scores are computed based on exact matching of keywords. Although these models have already achieved good performance, it has been shown that most of dissatisfaction cases in relevance are due to term mismatch between queries and documents. In this paper, we introduce novel method to compute term frequency based on semantic similarities using distributed representations of words in a vector space (Word Embeddings). Our main goal is to allow distinct but semantically related terms to match each other and contribute to the relevance scores. Hence, Arabic documents are retrieved beyond the bag-of-words paradigm based on semantic similarities between word vectors. The results on Arabic standard TREC data sets show significant improvement over the baseline bag-of-words models.
引用
收藏
页码:385 / 389
页数:5
相关论文
共 50 条
  • [41] CV Retrieval System based on job description matching using hybrid word embeddings
    Fernandez-Reyes, Francis C.
    Shinde, Suraj
    COMPUTER SPEECH AND LANGUAGE, 2019, 56 : 73 - 79
  • [42] A topic-based term frequency normalization framework to enhance probabilistic information retrieval
    Jian, Fanghong
    Huang, Jimmy X.
    Zhao, Jiashu
    Ying, Zhiwei
    Wang, Yuqi
    COMPUTATIONAL INTELLIGENCE, 2020, 36 (02) : 486 - 521
  • [43] Bi-Gram Term Collocations-based Query Expansion Approach for Improving Arabic Information Retrieval
    Moawad, Ibrahim
    Alromima, Waseem
    Elgohary, Rania
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (12) : 7705 - 7718
  • [44] Bi-Gram Term Collocations-based Query Expansion Approach for Improving Arabic Information Retrieval
    Ibrahim Moawad
    Waseem Alromima
    Rania Elgohary
    Arabian Journal for Science and Engineering, 2018, 43 : 7705 - 7718
  • [45] Music Information Retrieval Based on Active Frequency
    Wibowo, Hardianto
    Suharso, Wildan
    Azhar, Yufis
    Wicaksono, Galih Wasis
    Minarno, Agus Eko
    Harmanto, Dani
    MAKARA JOURNAL OF TECHNOLOGY, 2021, 25 (02): : 84 - 90
  • [46] Term frequency - function of document frequency: a new term weighting scheme for enterprise information retrieval
    Zhang, Hui
    Wang, Deqing
    Wu, Wenjun
    Hu, Hongping
    ENTERPRISE INFORMATION SYSTEMS, 2012, 6 (04) : 433 - 444
  • [47] A rule-based extensible stemmer for information retrieval with application to Arabic
    Harmanani, HM
    Keirouz, WT
    Raheel, S
    Proceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing, 2004, : 35 - 40
  • [48] An accurate Arabic root-based lemmatizer for information retrieval purposes
    El-Shishtawy, Tarek
    El-Ghannam, Fatma
    International Journal of Computer Science Issues, 2012, 9 (1 1-3): : 58 - 66
  • [49] Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks
    Shirui Wang
    Wenan Zhou
    Qiang Zhou
    Neural Processing Letters, 2020, 52 : 1109 - 1121
  • [50] Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks
    Wang, Shirui
    Zhou, Wenan
    Zhou, Qiang
    NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1109 - 1121