Combining N-Grams and Stemming for Arabic Word-Based Inexact Matching and Term Conflation

被引：3

作者：

Mustafa, Suleiman H. ^{[1
]}

机构：

[1] Yarmouk Univ, Dept Comp Informat Syst, Irbid, Jordan

来源：

JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT | 2005年 / 4卷 / 01期

关键词：

N-grams; Arabic string matching; text searching; stemming; information retrieval; word conflation;

D O I：

10.1142/S0219649205000992

中图分类号：

G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];

学科分类号：

1205 ; 120501 ;

摘要：

In this paper, the results of three N-gram techniques have been reported. Two of these techniques were based on the idea of combining N-grams and stemming. The first used first-order stemming, while the other used light stemming. The performance of the combined approach was then compared with that of pure conventional N-gram-based string matching. The results provide good evidence that combining N-grams with stemming improves the overall performance, as measured by word-match recall and word-match precision, using different similarity threshold values.

引用

页码：29 / 36

页数：8

共 16 条

[1] Corpus-Based Arabic Stemming Using N-Grams
Zitouni, Abdelaziz
Damankesh, Asma
Barakati, Foroogh
Atari, Maha
Watfa, Mohamed
Oroumchian, Farhad
[J]. INFORMATION RETRIEVAL TECHNOLOGY, 2010, 6458 : 280 - 289
[2] Using Word N-Grams as Features in Arabic Text Classification
Al-Thubaity, Abdulmohsen
Alhoshan, Muneera
Hazzaa, Itisam
[J]. SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2015, 569 : 35 - 43
[3] Combining Word and Character N-grams for Detecting Deceptive Opinions
Siagian, Al Hafiz Akbar Maulana
Aritsugi, Masayoshi
[J]. 2017 IEEE 41ST ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2017, : 828 - 833
[4] Evaluation of N-grams conflation approach in text-based information retrieval
Kosinov, S
[J]. EIGHTH SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2001, : 136 - 142
[5] Diacritics restoration based on word n-grams for Slovak texts
Toth, Stefan
Zaymus, Emanuel
Duracik, Michal
Hrkut, Patrik
Mesko, Matej
[J]. OPEN COMPUTER SCIENCE, 2021, 11 (01): : 180 - 189
[6] Improvement of Imperfect String Matching Based on Asymmetric n-Grams
Szymanski, Julian
Boinski, Tomasz
[J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, 2013, 8083 : 306 - 315
[7] A Probabilistic Model Based on n-Grams for Bilingual Word Sense Disambiguation
Vilarino, Darnes
Pinto, David
Tovar, Mireya
Balderas, Carlos
Beltran, Beatriz
[J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, MICAI 2010, PT I, 2010, 6437 : 82 - 91
[8] Automatic restoration of diacritics based on word n-grams for Slovak texts
Toth, Stefan
Zaymus, Emanuel
Duracik, Michal
Mesko, Matej
Hrkut, Patrik
[J]. 2019 IEEE 15TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATICS (INFORMATICS 2019), 2019, : 243 - 248
[9] AUTOMATIC RECOGNITION OF COMMON ARABIC HANDWRITTEN WORDS BASED ON OCR AND N-GRAMS
Dinges, Laslo
Al-Hamadi, Ayoub
Elzobi, Moftah
Nuernberger, Andreas
[J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3625 - 3629
[10] Automatic word spacing using probabilistic models based on character n-grams
Lee, Do-Gil
Rim, Hae-Chang
Yook, Dongsuk
[J]. IEEE INTELLIGENT SYSTEMS, 2007, 22 (01) : 28 - 35

← 1 2 →