An Efficient Feature Selection using Hidden Topic in Text Categorization

被引:10
|
作者
Zhang, Zhiwei [1 ]
Phan, Xuan-Hieu [1 ]
Horiguchi, Susumu [1 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 980, Japan
来源
2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3 | 2008年
关键词
D O I
10.1109/WAINA.2008.137
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Text categorization is an important research area in information retrieval. In order to save the storage space and get better accuracy, efficient and effective feature selection methods for reducing the data before analysis are highly desired Usual v, researches on feature selection use only a proper measurement such as information gain. In this paper, we propose a new feature selection method by adopting an attractive hidden topic analysis and entropy-based feature ranking. Experiments dealing with the well-known Reuters-21578 and Ohsumed datasets show that our method can achieve a better classification accuracy while reducing the feature dimension dramatically.
引用
收藏
页码:1223 / 1228
页数:6
相关论文
共 50 条
  • [41] An Improved Strategy of the Feature Selection Algorithm for the Text Categorization
    Yang, Jieming
    Lu, Yixin
    Liu, Zhiying
    2019 20TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2019, : 3 - 7
  • [42] Five new feature selection metrics in text categorization
    Song, Fengxi
    Zhang, David
    Xu, Yong
    Wang, Jizhong
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (06) : 1085 - 1101
  • [43] An extensive empirical study of feature selection for text categorization
    Qiu, Li-Qing
    Zhao, Ru-Yi
    Zhou, Gang
    Yi, Sheng-Wei
    7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS, 2008, : 312 - 315
  • [44] Incorporating Game Theory in Feature Selection for Text Categorization
    Azam, Nouman
    Yao, JingTao
    ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, RSFDGRC 2011, 2011, 6743 : 215 - 222
  • [45] Memetic feature selection for multilabel text categorization using label frequency difference
    Lee, Jaesung
    Yu, Injun
    Park, Jaegyun
    Kim, Dae-Won
    INFORMATION SCIENCES, 2019, 485 : 263 - 280
  • [46] Feature Selection based on Supervised Topic Modeling for Boosting-Based Multi-Label Text Categorization
    Al-Salemi, Bassam
    Ayob, Masri
    Noah, Shahrul Azman Mohd
    Ab Aziz, Mohd Juzaiddin
    PROCEEDINGS OF THE 2017 6TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI'17), 2017,
  • [47] Feature selection for text data via topic modeling
    Jang, Woosol
    Kim, Ye Eun
    Son, Won
    KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (06) : 739 - 754
  • [48] Introducing a family of linear measures for feature selection in text categorization
    Combarro, EF
    Montañés, E
    Díaz, I
    Ranilla, J
    Mones, R
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (09) : 1223 - 1232
  • [49] Feature selection with a measure of deviations from Poisson in text categorization
    Ogura, Hiroshi
    Amano, Hiromi
    Kondo, Masato
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6826 - 6832
  • [50] A WordNet-based approach to feature selection in text categorization
    Zhang, K
    Sun, J
    Wang, B
    INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 475 - 484