A Chi-square Statistics Based Feature Selection Method in Text Classification

被引:0
|
作者
Zhai, Yujia [1 ]
Song, Wei
Liu, Xianjun
Liu, Lizhen
Zhao, Xinlei
机构
[1] Capital Normal Univ, Informat Engn Coll, Beijing, Peoples R China
关键词
text classification; feature selection; Chi-square Statistics;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Text classification refers to the process of automatically determining text categories based on text content in a given classification system. Text classification mainly includes several steps such as word segmentation, feature selection, weight calculation and classification performance evaluation. Among them, feature selection is a key step in text classification, which affects the classification accuracy. Feature selection can help indicate the relevance of text contents and can better classify the text. Meanwhile feature selection has a great influence on the classification result. Text classification is a very important module in text processing, and it is widely applied in areas like spam filtering, news classification, sentiment classification, and part-of-speech tagging. This paper proposes a method for extracting feature words based on Chi-square Statistics. Because the feature words that appear together or separately may differ in different situations, we classify texts by using single word and double words as features at the same time. Based on our method, we performed experiments using classical Naive Bayes and Support Vector Machine classification algoritluns. The efficiency of our method was demonstrated by the comparison and analysis of experimental results.
引用
收藏
页码:160 / 163
页数:4
相关论文
共 50 条
  • [21] Machine learning techniques and Chi-square feature selection for cancer classification using SAGE gene expression profiles
    Jin, Xin
    Xu, Anbang
    Bie, Rongfang
    Guo, Ping
    [J]. DATA MINING FOR BIOMEDICAL APPLICATIONS, PROCEEDINGS, 2006, 3916 : 106 - 115
  • [22] Analyzing the characteristics of application traffic behavior based on chi-square statistics
    Chen L.
    Gong J.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2010, 21 (11): : 2852 - 2865
  • [23] A FAST CHI-SQUARE BASED ALGORITHM FOR TEXT CATEGORIZATION OF MEDLINE CITATIONS
    Kastrin, Andrej
    Peterlin, Borut
    Hristovski, Dimitar
    [J]. IUBMB LIFE, 2009, 61 (03) : 326 - 326
  • [24] MINIMUM CHI-SQUARE STATISTICS IN CONTINGENCY-TABLES
    QUADE, D
    SALAMA, IA
    [J]. BIOMETRICS, 1975, 31 (04) : 953 - 956
  • [25] Maximally selected chi-square statistics for ordinal variables
    Boulesteix, AL
    [J]. BIOMETRICAL JOURNAL, 2006, 48 (03) : 451 - 462
  • [26] Multiple differential cryptanalysis using chi-square statistics
    Gao, Hai-Ying
    Jin, Chen-Hui
    Zhang, Jun-Qi
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2014, 42 (09): : 1775 - 1780
  • [27] NOTE ON CHI-SQUARE STATISTICS WITH RANDOM CELL BOUNDARIES
    RUYMGAART, FH
    [J]. ANNALS OF STATISTICS, 1975, 3 (04): : 965 - 968
  • [28] MAXIMALLY SELECTED CHI-SQUARE STATISTICS FOR SMALL SAMPLES
    HALPERN, J
    [J]. BIOMETRICS, 1982, 38 (04) : 1017 - 1023
  • [29] Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application
    Alshaer, Hadeel N.
    Otair, Mohammed A.
    Abualigah, Laith
    Alshinwan, Mohammad
    Khasawneh, Ahmad M.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (07) : 10373 - 10390
  • [30] Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application
    Hadeel N. Alshaer
    Mohammed A. Otair
    Laith Abualigah
    Mohammad Alshinwan
    Ahmad M. Khasawneh
    [J]. Multimedia Tools and Applications, 2021, 80 : 10373 - 10390