Enhanced query expansion in English-Arabic CLIR

被引:7
|
作者
Bellaachia, Abdelgbani [1 ]
Arnor-Tijani, Ghita [1 ]
机构
[1] Dept Comp Sci, Washington, DC 20052 USA
关键词
D O I
10.1109/DEXA.2008.52
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning. Modem Standard Arabic, which is used in formal writings, is the ancient Arabic language incorporated with loanwords derived from foreign languages. Different synonyms and loanwords tend to be used in different writings. Indeed, the Arabic composition style tends to vary throughout the Arab countries (Abdelali, 2004). Relevant documents could be overlooked when the query terms are synonyms or related to the ones used in the document collection. This could deteriorate the performance of a Cross Lingual Information Retrieval (CLIR) system. Query Expansion (QE) using the document collection is the usual approach taken to enrich translated queries with context related terms. In this study, QE is explored for an English-Arabic CLIR system in which English queries are used to search Arabic documents. A thesaurus-based disambiguation approach is applied to further optimize the effectiveness of that technique. Indeed, experimental results show that QE enhanced by disambiguation gives an improved effectiveness.
引用
收藏
页码:61 / 66
页数:6
相关论文
共 50 条