Improving K-Nearest Neighbor Efficacy for FarsiText Classification

被引:0
|
作者
Elahimanesh, Mohammad Hossein [1 ,2 ]
BehrouzMinaei-Bidgoli [2 ,3 ]
Malekinezhad, Hossein [2 ,4 ]
机构
[1] Islamic Azad Univ, Qazvin Branch, Qazvin, Iran
[2] Comp Res Ctr Islamic Sci, Qom, Iran
[3] Iran Univ Sci & Technol, Tehran, Iran
[4] Islamic Azad Univ, Naragh Branch, Naragh, Iran
关键词
Text classification; N-grams of characters; K-nearest neighbor;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
One of the common processes in the field of text mining is text classification. Because of the complex nature of Farsi language, words with separate parts and combined verbs, the most of text classification systems are not applicable to Farsi texts. K-Nearest Neighbors (KNN) is one of the most popular used methods for text classification and presents good performance in experiments on different datasets. A method to improve the classification performance of KNN is proposed in this paper. Effects of removing or maintaining stop words, applying N-Grams with different lengths are also studied. For this study, a portion of a standard Farsi corpus called Hamshahril and articles of some archived newspapers are used. As the results indicate, classification efficiency improves by applying this approach especially when eight-grams indexing method and removing stop words are applied. Using N-grams with lengths more than 3 characters, presented very encouraging results for Farsi text classification. The Results of classification using our method are compared with the results obtained by mentioned related works.
引用
收藏
页码:1618 / 1621
页数:4
相关论文
共 50 条
  • [1] Improved k-nearest neighbor classification
    Wu, YQ
    Ianakiev, K
    Govindaraju, V
    [J]. PATTERN RECOGNITION, 2002, 35 (10) : 2311 - 2318
  • [2] Analysis of the k-nearest neighbor classification
    Li, Jing
    Cheng, Ming
    [J]. INFORMATION SCIENCE AND MANAGEMENT ENGINEERING, VOLS 1-3, 2014, 46 : 1911 - 1917
  • [3] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
    Okfalisa
    Mustakim
    Gazalba, Ikbal
    Reza, Nurul Gayatri Indah
    [J]. 2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298
  • [4] Joint Evidential K-Nearest Neighbor Classification
    Gong, Chaoyu
    Li, Yongbin
    Liu, Yong
    Wang, Pei-hong
    You, Yang
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2113 - 2126
  • [6] Improving K-Nearest Neighbor Rule with Dual Weighted Voting for Pattern Classification
    Gou, Jianping
    Luo, Mingying
    Xiong, Taisong
    [J]. COMPUTER SCIENCE FOR ENVIRONMENTAL ENGINEERING AND ECOINFORMATICS, PT 2, 2011, 159 : 118 - 123
  • [7] IMPROVING K-NEAREST NEIGHBOR EFFICIENCY FOR TEXT CATEGORIZATION
    Barigou, F.
    [J]. NEURAL NETWORK WORLD, 2016, 26 (01) : 45 - 65
  • [8] A k-nearest neighbor approach for chromosome shape classification
    Serbanescu, Mircea Sebastian
    [J]. ANNALS OF THE UNIVERSITY OF CRAIOVA-MATHEMATICS AND COMPUTER SCIENCE SERIES, 2010, 37 (03): : 142 - 146
  • [9] IKNN: Informative K-nearest neighbor pattern classification
    Song, Yan
    Huang, Jian
    Zhou, Ding
    Zha, Hongyuan
    Giles, C. Lee
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2007, PROCEEDINGS, 2007, 4702 : 248 - +
  • [10] An Improved K-Nearest Neighbor Algorithm for Pattern Classification
    Sultana, Zinnia
    Ferdousi, Ashifatul
    Tasnim, Farzana
    Nahar, Lutfun
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 760 - 767