Enhanced Filter Feature Selection Methods for Arabic Text Categorization

被引:11
|
作者
Ghareb, Abdullah Saeed [1 ]
Abu Bakara, Azuraliza [2 ]
Al-Radaideh, Qasem A. [3 ]
Hamdan, Abdul Razak [4 ]
机构
[1] Natl Univ Malaysia, Bangi, Malaysia
[2] Natl Univ Malaysia, Data Min, Bangi, Malaysia
[3] Yarmouk Univ, Dept Comp Informat Syst, Irbid, Jordan
[4] Natl Univ Malaysia, Comp Sci & Data Min, Bangi, Malaysia
关键词
Arabic Text Categorization; Associative Classification; Feature Selection; Naive Bayes;
D O I
10.4018/IJIRR.2018040101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The filtering of a large amount of data is an important process in data mining tasks, particularly for the categorization of unstructured high dimensional data. Therefore, a feature selection process is desired to reduce the space of high dimensional data into small relevant subset dimensions that represent the best features for text categorization. In this article, three enhanced filter feature selection methods, Category Relevant Feature Measure, Modified Category Discriminated Measure, and Odd Ratio2, are proposed. These methods combine the relevant information about features in both the inter- and intra-category. The effectiveness of the proposed methods with Naive Bayes and associative classification is evaluated by traditional measures of text categorization, namely, macro-averaging of precision, recall, and F-measure. Experiments are conducted on three Arabic text datasets used for text categorization. The experimental results showed that the proposed methods are able to achieve better and comparable results when compared to 12 well known traditional methods.
引用
收藏
页码:1 / 24
页数:24
相关论文
共 50 条
  • [1] The Hybrid Filter Feature Selection Methods for Improving High-Dimensional Text Categorization
    Le Nguyen Hoai Nam
    Ho Bao Quoc
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2017, 25 (02) : 235 - 265
  • [2] A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization
    Harrag, Fouzi
    El-Qawasmeh, Eyas
    Al-Salman, Abdul Malik S.
    [J]. NETWORKED DIGITAL TECHNOLOGIES, PT 2, 2010, 88 : 676 - +
  • [3] Stemming versus light stemming as feature selection techniques for Arabic text categorization
    Duwairi, Rehab
    Al-Refai, Mohammad
    Khasawneh, Natheer
    [J]. 2007 INNOVATIONS IN INFORMATION TECHNOLOGIES, VOLS 1 AND 2, 2007, : 199 - 203
  • [4] Feature Reduction Techniques for Arabic Text Categorization
    Duwairi, Rehab
    Al-Refai, Mohammad Nayef
    Khasawneh, Natheer
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (11): : 2347 - 2352
  • [5] An alternative framework for univariate filter based feature selection for text categorization
    Guru, D. S.
    Suhil, Mahamad
    Raju, Lavanya Narayana
    Kumar, N. Vinay
    [J]. PATTERN RECOGNITION LETTERS, 2018, 103 : 23 - 31
  • [6] Arabic Text Classification: A Review Study on Feature Selection Methods
    Hijazi, Musab Mustafa
    Zeki, Akram
    Ismail, Amelia
    [J]. 2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 554 - 559
  • [7] Filter feature selection methods for text classification: a review
    Ming, Hong
    Heyong, Wang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 2053 - 2091
  • [8] Filter feature selection methods for text classification: a review
    Hong Ming
    Wang Heyong
    [J]. Multimedia Tools and Applications, 2024, 83 : 2053 - 2091
  • [9] New Model of Feature Selection based Chaotic Firefly Algorithm for Arabic Text Categorization
    Hadni, Meryeme
    Hjiaj, Hassane
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (3A) : 461 - 468
  • [10] Feature selection in SVM text categorization
    Taira, H
    Haruno, M
    [J]. SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 480 - 486