A Novel Term Weighting Scheme and an Approach for Classification of Agricultural Arabic Text Complaints

被引:0
|
作者
Guru, D. S. [1 ]
Ali, Mostafa [1 ]
Suhil, Mahamad [1 ]
机构
[1] Univ Mysore, Dept Studies Comp Sci, Mysore, Karnataka, India
关键词
Arabic Text Classification; Feature Extraction; Farmers' Complaints; TCW-ICF; Features selection techniques; KNN; CATEGORIZATION; SELECTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a machine learning based approach for classification of farmers' complaints which are in Arabic text into different crops has been proposed. Initially, the complaints are preprocessed using stop word removal, auto correction of words, handling some special cases and stemming to extract only the content terms. Some of the domain specific special cases which may affect the classification performance are handled. A new term weighting scheme called Term Class Weight-Inverse Class Frequency (TCW-ICF) is then used to extract the most discriminating features with respect to each class. The extracted features are then used to represent the preprocessed complaints in the form of feature vectors for training a classifier. Finally, an unlabeled complaint is classified as a member of one of the crop classes by the trained classifier. Nevertheless, a relatively large dataset consisting of more than 5000 complaints of the farmers described in Arabic script from eight different crops has been created. The proposed approach has been experimentally validated by conducting an extensive experimentation on the newly created dataset using KNN classifier. It has been argued that the proposed outperforms the baseline Vector Space Model (VSM). Further, the superiority of the proposed term weighting scheme in selecting the best set of discriminating features has been demonstrated through a comparative analysis against four well-known feature selection techniques. The new term is applied on Arabic script as a case study but it can be applied on any text data from any language.
引用
收藏
页码:24 / 28
页数:5
相关论文
共 50 条
  • [1] A Term Weighting Scheme Approach for Vietnamese Text Classification
    Vu Thanh Nguyen
    Nguyen Tri Hai
    Nguyen Hoang Nghia
    Tuan Dinh Le
    [J]. FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015, 2015, 9446 : 46 - 53
  • [2] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    [J]. Informatica (Slovenia), 2022, 46 (02): : 259 - 268
  • [3] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    [J]. INFORMATICA-AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS, 2022, 46 (02): : 259 - 268
  • [4] A novel term weighting scheme for text classification: TF-MONO
    Dogan, Turgut
    Uysal, Alper Kursat
    [J]. JOURNAL OF INFORMETRICS, 2020, 14 (04)
  • [5] An improved term weighting scheme for text classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (09):
  • [6] Imbalanced text classification: A term weighting approach
    Liu, Ying
    Loh, Han Tong
    Sun, Aixin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 690 - 701
  • [7] A Study of Applying Different Term Weighting Schemes on Arabic Text Classification
    Guru, D. S.
    Ali, Mostafa
    Suhil, Mahamad
    Hazman, Maryam
    [J]. DATA ANALYTICS AND LEARNING, 2019, 43 : 293 - 305
  • [8] A NOVEL TERM WEIGHTING SCHEME MIDF FOR TEXT CATEGORIZATION
    Deisy, C.
    Gowri, M.
    Baskar, S.
    Kalaiarasi, S. M. A.
    Ramraj, N.
    [J]. JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2010, 5 (01) : 94 - 107
  • [9] A novel term weighting scheme for automated text categorization
    Xu, Hongzhi
    Li, Chunping
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 759 - 764
  • [10] An improved supervised term weighting scheme for text representation and classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 189