On the Use of Arabic Stemmers to Increase the Recall of Information Retrieval Systems

被引:0
|
作者
Nasra, Ihab [1 ]
Maree, Mohammed [2 ]
机构
[1] Arab Amer Univ, Dept Comp Sci, Jenin, Palestine
[2] Arab Amer Univ, Dept Informat Technol, Jenin, Palestine
关键词
Information Retrieval; Arabic Stemming; Morphological Analysis; Natural Language Processing; Rule-Based Stemmers;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Building robust information revival systems demands employing efficient natural language processing and morphological analysis techniques. These techniques are commonly exploited to find syntactic and semantic matches between users' queries and their corresponding documents. Word stemming is one those techniques that has been widely employed in Information Retrieval systems, namely to increase their recall. A lot of research work has been conducted to evaluate English stemming techniques. However, a little attention has been given to Arabic stemmers. In this research work, we present a comprehensive review of state-of-the-art Arabic stemming techniques and compare between them according to a variety of criteria. In addition, we classify existing Arabic stemmers into four categories: Root-based, Affix Removal, Rule-based, and Context-based techniques. We review seven of the most commonly used Arabic stemming algorithms that fall under these categories, and provide a comparative analysis and evaluation between them according to the goal, input, employed approach, and output of each technique. We conclude this study by proposing our idea of building a hybrid Arabic stemming approach that combines multiple stemmers and exploits a new set of rules to better stem Arabic words.
引用
收藏
页码:2462 / 2468
页数:7
相关论文
共 50 条