Weighted average pointwise mutual information for feature selection in text categorization

被引:0
|
作者
Schneider, KM [1 ]
机构
[1] Univ Passau, Dept Gen Linguist, D-94030 Passau, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mutual information is a common feature score in feature selection for text categorization. Mutual information suffers from two theoretical problems: It assumes independent word variables, and longer documents are given higher weights in the estimation of the feature scores, which is in contrast to common evaluation measures that do not distinguish between long and short documents. We propose a variant of mutual information, called Weighted Average Pointwise Mutual Information (WAPMI) that avoids both problems. We provide theoretical as well as extensive empirical evidence in favor of WAPMI. Furthermore, we show that WAPMI has a nice property that other feature metrics lack, namely it allows to select the best feature set size automatically by maximizing an objective function, which can be done using a simple heuristic without resorting to costly methods like EM and model selection.
引用
收藏
页码:252 / 263
页数:12
相关论文
共 50 条
  • [1] Study on mutual information-based feature selection for text categorization
    Xu, Yan
    Jones, Gareth
    Li, Jintao
    Wang, Bin
    Sun, Chunming
    [J]. Journal of Computational Information Systems, 2007, 3 (03): : 1007 - 1012
  • [2] Modified Pointwise Mutual Information-Based Feature Selection for Text Classification
    Georgieva-Trifonova, Tsvetanka
    [J]. PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2021, VOL 2, 2022, 359 : 333 - 353
  • [3] Weighted Mutual Information for Feature Selection
    Schaffernicht, Erik
    Gross, Horst-Michael
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT II, 2011, 6792 : 181 - 188
  • [4] Pointwise mutual information sparsely embedded feature selection
    Deng, Tingquan
    Huang, Yang
    Yang, Ge
    Wang, Changzhong
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2022, 151 : 251 - 270
  • [5] An Improved Feature Selection for Categorization Based on Mutual Information
    Liu, Haifeng
    Su, Zhan
    Yao, Zeqing
    Liu, Shousheng
    [J]. WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, 5854 : 80 - 87
  • [6] FEATURE SELECTION WITH WEIGHTED CONDITIONAL MUTUAL INFORMATION
    Celik, Ceyhun
    Bilge, Hasan Sakir
    [J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2015, 30 (04): : 585 - 596
  • [7] Discriminant Mutual Information for Text Feature Selection
    Wang, Jiaqi
    Zhang, Li
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 136 - 151
  • [8] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [9] Improved Mutual Information Method For Text Feature Selection
    Ding Xiaoming
    Tang Yan
    [J]. PROCEEDINGS OF THE 2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2013), 2013, : 163 - 166
  • [10] Feature selection based on weighted conditional mutual information
    Zhou, Hongfang
    Wang, Xiqian
    Zhang, Yao
    [J]. APPLIED COMPUTING AND INFORMATICS, 2024, 20 (1/2) : 55 - 68