Semantically Enhanced Term Frequency based on Word Embeddings for Arabic Information Retrieval

被引:0
|
作者
El Mahdaouy, Abdelkader [1 ,2 ]
El Alaoui, Said Ouatik [1 ]
Gaussier, Eric [2 ]
机构
[1] Univ USMBA, FSDM, LIM, Fes, Morocco
[2] Univ Grenoble Alpes, CNRS, LIG, AMA, Grenoble, France
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional Information Retrieval (IR) models are based on bag-of-words paradigm, where relevance scores are computed based on exact matching of keywords. Although these models have already achieved good performance, it has been shown that most of dissatisfaction cases in relevance are due to term mismatch between queries and documents. In this paper, we introduce novel method to compute term frequency based on semantic similarities using distributed representations of words in a vector space (Word Embeddings). Our main goal is to allow distinct but semantically related terms to match each other and contribute to the relevance scores. Hence, Arabic documents are retrieved beyond the bag-of-words paradigm based on semantic similarities between word vectors. The results on Arabic standard TREC data sets show significant improvement over the baseline bag-of-words models.
引用
收藏
页码:385 / 389
页数:5
相关论文
共 50 条
  • [21] Semantically-enhanced information retrieval using multiple knowledge sources
    Yuncheng Jiang
    Cluster Computing, 2020, 23 : 2925 - 2944
  • [22] Exploring Term Proximity Statistic for Arabic Information Retrieval
    El Mandaouy, Abdelkader
    Gaussier, Eric
    El Alaoui, Said Ouatik
    2014 THIRD IEEE INTERNATIONAL COLLOQUIUM IN INFORMATION SCIENCE AND TECHNOLOGY (CIST'14), 2014, : 272 - 277
  • [23] From Word Embeddings To Document Similarities for Improved Information Retrieval in Software Engineering
    Ye, Xin
    Shen, Hui
    Ma, Xiao
    Bunescu, Razvan
    Liu, Chang
    2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, : 404 - 415
  • [24] AN ACCURACY-ENHANCED STEMMING ALGORITHM FOR ARABIC INFORMATION RETRIEVAL
    Bessou, Sadik
    Touahria, Mohamed
    NEURAL NETWORK WORLD, 2014, 24 (02) : 117 - 128
  • [25] Multi Word Term Queries for Focused Information Retrieval
    SanJuan, Eric
    Ibekwe-SanJuan, Fidelia
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2010, 6008 : 590 - +
  • [26] Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation
    Zouaghi, Anis
    Merhbene, Laroussi
    Zrigui, Mounir
    ARTIFICIAL INTELLIGENCE REVIEW, 2012, 38 (04) : 257 - 269
  • [27] Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation
    Anis Zouaghi
    Laroussi Merhbene
    Mounir Zrigui
    Artificial Intelligence Review, 2012, 38 : 257 - 269
  • [28] Using Word Embeddings for Query Translation for Hindi to English Cross Language Information Retrieval
    Bhattacharya, Paheli
    Goyal, Pawan
    Sarkar, Sudeshna
    COMPUTACION Y SISTEMAS, 2016, 20 (03): : 435 - 447
  • [29] Towards Useful Word Embeddings Evaluation on Information Retrieval, Text Classification, and Language Modeling
    Novotny, Vit
    Stefanik, Michal
    Luptak, David
    Sojka, Petr
    RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING (RASLAN 2020), 2020, : 37 - 46
  • [30] Information Retrieval Based on Word Semantic Clustering
    Chang, Chia-Yang
    Lin, Yan-Ting
    Lee, Shie-Jue
    Lai, Chih-Chin
    2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,