Machine learning for Arabic text categorization

被引:25
|
作者
Duwairi, Rehab M. [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Informat Syst, Irbid, Jordan
关键词
D O I
10.1002/asi.20360
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article we propose a distance-based classifier for categorizing Arabic text. Each category is represented as a vector of words in an m-dimensional space, and documents are classified on the basis of their closeness to feature vectors of categories. The classifier, in its learning phase, scans the set of training documents to extract features of categories that capture inherent category-specific properties; in its testing phase the classifier uses previously determined category-specific features to categorize unclassified documents. Stemming was used to reduce the dimensionality of feature vectors of documents. The accuracy of the classifier was tested by carrying out several categorization tasks on an in-house collected Arabic corpus. The results show that the proposed classifier is very accurate and robust.
引用
收藏
页码:1005 / 1010
页数:6
相关论文
共 50 条
  • [1] Arabic Text Categorization using Machine Learning Approaches
    Alshammari, Riyad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (03) : 226 - 230
  • [2] Machine learning in automated text categorization
    Sebastiani, F
    [J]. ACM COMPUTING SURVEYS, 2002, 34 (01) : 1 - 47
  • [3] Automatic Arabic Text Categorization using Bayesian Learning
    Kadhim, Mahmood H.
    Omar, Nazlia
    [J]. 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 415 - 419
  • [4] Text Categorization with Machine Learning and Hierarchical Structures
    Krendzelak, M.
    Jakab, F.
    [J]. 2015 13TH INTERNATIONAL CONFERENCE ON EMERGING ELEARNING TECHNOLOGIES AND APPLICATIONS (ICETA), 2015, : 213 - 217
  • [5] Machine Learning Methods for Medical Text Categorization
    Zhang, Qirui
    Tan, Jinghua
    Zhou, Huaying
    Tao, Weiye
    He, Kejing
    [J]. PROCEEDINGS OF THE 2009 PACIFIC-ASIA CONFERENCE ON CIRCUITS, COMMUNICATIONS AND SYSTEM, 2009, : 494 - +
  • [6] Machine learning for text categorization: Background and characteristics
    Lewis, DD
    [J]. NATIONAL ONLINE MEETING, PROCEEDINGS 2000, 2000, : 221 - 226
  • [7] Arabic Text Categorization Using SVM Active Learning Technique : An Overview
    Goudjil, Mohamed
    Koudil, Mouloud
    Hammami, Nacereddine
    Bedda, Mouldi
    Alruily, Meshrif
    [J]. WORLD CONGRESS ON COMPUTER & INFORMATION TECHNOLOGY (WCCIT 2013), 2013,
  • [8] Text categorization based on regularization extreme learning machine
    Wenbin Zheng
    Yuntao Qian
    Huijuan Lu
    [J]. Neural Computing and Applications, 2013, 22 : 447 - 456
  • [9] Text categorization based on regularization extreme learning machine
    Zheng, Wenbin
    Qian, Yuntao
    Lu, Huijuan
    [J]. NEURAL COMPUTING & APPLICATIONS, 2013, 22 (3-4): : 447 - 456
  • [10] Recognition of printed Arabic text via machine learning
    Amin, A
    [J]. INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, 1999, : 317 - 326