A Chi-square Statistics Based Feature Selection Method in Text Classification

被引:0
|
作者
Zhai, Yujia [1 ]
Song, Wei
Liu, Xianjun
Liu, Lizhen
Zhao, Xinlei
机构
[1] Capital Normal Univ, Informat Engn Coll, Beijing, Peoples R China
关键词
text classification; feature selection; Chi-square Statistics;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Text classification refers to the process of automatically determining text categories based on text content in a given classification system. Text classification mainly includes several steps such as word segmentation, feature selection, weight calculation and classification performance evaluation. Among them, feature selection is a key step in text classification, which affects the classification accuracy. Feature selection can help indicate the relevance of text contents and can better classify the text. Meanwhile feature selection has a great influence on the classification result. Text classification is a very important module in text processing, and it is widely applied in areas like spam filtering, news classification, sentiment classification, and part-of-speech tagging. This paper proposes a method for extracting feature words based on Chi-square Statistics. Because the feature words that appear together or separately may differ in different situations, we classify texts by using single word and double words as features at the same time. Based on our method, we performed experiments using classical Naive Bayes and Support Vector Machine classification algoritluns. The efficiency of our method was demonstrated by the comparison and analysis of experimental results.
引用
收藏
页码:160 / 163
页数:4
相关论文
共 50 条
  • [1] Chi-square Statistics Feature Selection Based on Term Frequency and Distribution for Text Categorization
    Jin, Chuanxin
    Ma, Tinghuai
    Hou, Rongtao
    Tang, Meili
    Tian, Yuan
    Al-Dhelaan, Abdullah
    Al-Rodhaan, Mznah
    [J]. IETE JOURNAL OF RESEARCH, 2015, 61 (04) : 351 - 362
  • [2] Feature selection using an improved Chi-square for Arabic text classification
    Bahassine, Said
    Madani, Abdellah
    Al-Sarem, Mohammed
    Kissi, Mohamed
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (02) : 225 - 231
  • [3] Arabic Text Classification Using Hybrid Feature Selection Method Using Chi-Square Binary Artificial Bee Colony Algorithm
    Hijazi, Musab
    Zeki, Akram
    Ismail, Amelia
    [J]. INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE, 2021, 16 (01): : 213 - 228
  • [5] Learning Vector Quantization for Diabetes Data Classification with Chi-Square Feature Selection
    Putri, Nadisa Karina
    Rustam, Zuherman
    Sarwinda, Devvi
    [J]. 9TH ANNUAL BASIC SCIENCE INTERNATIONAL CONFERENCE 2019 (BASIC 2019), 2019, 546
  • [6] Properties of chi-square statistic and information gain for feature selection of imbalanced text data
    Mun, Hye In
    Son, Won
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (04) : 469 - 484
  • [7] Using chi-square statistics to measure similarities for text categorization
    Chen, Yao-Tsung
    Chen, Meng Chang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (04) : 3085 - 3090
  • [8] A fuzzy rough granular ensemble learning based on the feature selection with chi-square
    Hou, Xianyu
    Chen, Yumin
    Wu, Keshou
    Zhou, Ying
    Lu, Junwen
    Weng, Xuan
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (03) : 6201 - 6217
  • [9] Chi-Square and PCA Based Feature Selection for Diabetes Detection with Ensemble Classifier
    Rupapara, Vaibhav
    Rustam, Furqan
    Ishaq, Abid
    Lee, Ernesto
    Ashraf, Imran
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02): : 1931 - 1949
  • [10] ON MAXIMALLY SELECTED CHI-SQUARE STATISTICS
    KOZIOL, JA
    [J]. BIOMETRICS, 1991, 47 (04) : 1557 - 1561