Mutual Information Using Sample Variance for Text Feature Selection

被引:7
|
作者
Agnihotri, Deepak [1 ]
Verma, Kesari [1 ]
Tripathi, Priyanka [2 ]
机构
[1] Natl Inst Technol, Dept Comp Applicat, Raipur 492010, CG, India
[2] Natl Inst Tech Teachers Training & Res Bhopal, Dept Comp Engn & Applicat, Bhopal, MP, India
关键词
Feature selection; Text Classification; Term Frequency; Text Analysis; Text Mining; Information Retrieval; SCHEME;
D O I
10.1145/3162957.3163054
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Feature selection improves the training speed of the classifier without affecting its predictive capability. It selects a subset of most informative words (terms) from the set of all words. Term distribution affects the feature selection process, e.g. an even distribution of terms in a specific class ensures a higher association of these terms with that class, but an even distribution in almost classes shows a lesser association. This paper computes sample variance using standard Mutual Information (MI) method to measure the variations in distribution of terms. MI method assigns a higher rank to the terms distributed in a specific category (i.e. rare terms) which shows it strong influence with the rare terms than common terms (i.e. terms which most frequently in almost classes). To address this issue, a new text feature selection method named Mutual Information Using Sample Variance (MIUSV) is proposed in this paper. It considers sample variance in term distribution while computing the Mutual Information score of the term. Multinomial Naive Bayes (MNB) and k Nearest Neighbor (kNN) classifiers model, check the utilities of the selected terms by the proposed MIUSV. These models classify four standard text data sets, viz. Webkb, 20Newsgroup, Ohsumed10, and Ohsumed23. Two standard performance measures named Macro-F1 and Micro-F1 show a significant improvement in the results using proposed MIUSV method.
引用
收藏
页码:39 / 44
页数:6
相关论文
共 50 条
  • [31] Stable feature selection using copula based mutual information
    Lall, Snehalika
    Sinha, Debajyoti
    Ghosh, Abhik
    Sengupta, Debarka
    Bandyopadhyay, Sanghamitra
    [J]. PATTERN RECOGNITION, 2021, 112
  • [32] Feature Selection for Chemical Sensor Arrays Using Mutual Information
    Wang, X. Rosalind
    Lizier, Joseph T.
    Nowotny, Thomas
    Berna, Amalia Z.
    Prokopenko, Mikhail
    Trowell, Stephen C.
    [J]. PLOS ONE, 2014, 9 (03):
  • [33] Feature selection using a sinusoidal sequence combined with mutual information
    Yuan, Gaoteng
    Lu, Lu
    Zhou, Xiaofeng
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [34] An optimal feature selection technique using the concept of mutual information
    Al-Ani, A
    Deriche, M
    [J]. ISSPA 2001: SIXTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2001, : 477 - 480
  • [35] Semantic similarity-aware feature selection and redundancy removal for text classification using joint mutual information
    Lazhar, Farek
    Amira, Benaidja
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024,
  • [36] Heterogeneous feature subset selection using mutual information-based feature transformation
    Wei, Min
    Chow, Tommy W. S.
    Chan, Rosa H. M.
    [J]. NEUROCOMPUTING, 2015, 168 : 706 - 718
  • [37] Novel Feature Selection Method using Mutual Information and Fractal Dimension
    Pham, D. T.
    Packianather, M. S.
    Garcia, M. S.
    Castellani, M.
    [J]. IECON: 2009 35TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS, VOLS 1-6, 2009, : 3217 - +
  • [38] POLARIMETRIC SAR DATA FEATURE SELECTION USING MEASURES OF MUTUAL INFORMATION
    Tanase, R.
    Radoi, A.
    Datcu, M.
    Raducanu, D.
    [J]. 2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 1140 - 1143
  • [39] Gait feature subset selection by mutual information
    Guo, Baofeng
    Nixon, Mark. S.
    [J]. 2007 FIRST IEEE INTERNATIONAL CONFERENCE ON BIOMETRICS: THEORY, APPLICATIONS AND SYSTEMS, 2007, : 187 - 192
  • [40] Conditional Mutual Information based Feature Selection
    Cheng, Hongrong
    Qin, Zhiguang
    Qian, Weizhong
    Liu, Wei
    [J]. KAM: 2008 INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING, PROCEEDINGS, 2008, : 103 - 107