An Arabic text categorization approach using term weighting and multiple reducts

被引:0
|
作者
Qasem A. Al-Radaideh
Mohammed A. Al-Abrat
机构
[1] Yarmouk University,Department of Computer Information Systems, Faculty of Information Technology and Computer Sciences
来源
Soft Computing | 2019年 / 23卷
关键词
Rough set theory; Arabic text categorization; Reducts extraction; Single reduct; Multiple reducts;
D O I
暂无
中图分类号
学科分类号
摘要
Text categorization is the process of assigning a predefined category label to an unlabeled document based on its content. One of the challenges of automatic text categorization is the high dimensionality of data that may affect the performance of the categorization model. This paper proposed an approach for the categorization of Arabic text based on term weighting and the reduct concept of the rough set theory to reduce the number of terms used to generate the classification rules that form the classifier. The paper proposed a multiple minimal reduct extraction algorithm by improving the Quick reduct algorithm. The multiple reducts are used to generate the set of classification rules which represent the rough set classifier. To evaluate the proposed approach, an Arabic corpus of 2700 documents nine categories is used. In the experiment, we compared the results of the proposed approach when using multiple and single minimal reducts. The results showed that the proposed approach had achieved an accuracy of 94% when using multiple reducts, which outperformed the single reduct method which achieved an accuracy of 86%. The results of the experiments also showed that the proposed approach outperforms both the K-NN and J48 algorithms regarding classification accuracy using the dataset on hand.
引用
收藏
页码:5849 / 5863
页数:14
相关论文
共 50 条
  • [31] A Term Weighting Scheme Based on the Measure of Relevance and Distinction for Text Categorization
    Yang, Jieming
    Wang, Jing
    Liu, Zhiying
    Qu, Zhaoyang
    2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 63 - 68
  • [32] A new term-weighting scheme for naive Bayes text categorization
    Mendoza, Marcelo
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2012, 8 (01) : 55 - +
  • [33] A Novel scheme for Term weighting in Text Categorization : Positive Impact factor
    Emmanuel, M.
    Khatri, Saurabh M.
    Babu, Ramesh D. R.
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 2292 - 2297
  • [34] Entropy-based Term Weighting Schemes for Text Categorization in VSM
    Wang, Tao
    Cai, Yi
    Leung, Ho-fung
    Cai, Zhiwei
    Min, Huaqing
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 325 - 332
  • [35] Supervised term weighting centroid-based classifiers for text categorization
    Nguyen, Tam T.
    Chang, Kuiyu
    Hui, Siu Cheung
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 35 (01) : 61 - 85
  • [36] Termset weighting by adapting term weighting schemes to utilize cardinality statistics for binary text categorization
    Dima Badawi
    Hakan Altınçay
    Applied Intelligence, 2017, 47 : 456 - 472
  • [37] Termset weighting by adapting term weighting schemes to utilize cardinality statistics for binary text categorization
    Badawi, Dima
    Altincay, Hakan
    APPLIED INTELLIGENCE, 2017, 47 (02) : 456 - 472
  • [38] A Study of Applying Different Term Weighting Schemes on Arabic Text Classification
    Guru, D. S.
    Ali, Mostafa
    Suhil, Mahamad
    Hazman, Maryam
    DATA ANALYTICS AND LEARNING, 2019, 43 : 293 - 305
  • [39] Imbalanced text classification: A term weighting approach
    Liu, Ying
    Loh, Han Tong
    Sun, Aixin
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 690 - 701
  • [40] A Multiple Kernel Learning Approach to Text Categorization
    Wang, Tinghua
    Xie, Haihui
    Zhong, Liyun
    Hu, Shengzhou
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2015, 12 (09) : 2121 - 2126