CLDA: Feature selection for text categorization based on constrained LDA

被引:7
|
作者
Cui Zifeng [1 ]
Xu Baowen [1 ]
Zhang Weifeng [2 ]
Jiang Dawei [1 ]
Xu Junling [1 ]
机构
[1] SouthEast Univ, Sch Comp Sci & Engn, Nanjing 210018, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Dept CS&E, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICSC.2007.108
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Feature selection is a necessary process before pattern classification, machine learning and data mining. Now feature selection is facing challenge in high dimension space, such as text categorization in information retrieval. Linear Discriminant Analysis (LDA) is an excellent dimensionality reduction method which transforms the original data into low-dimensional feature space. However, it changes the original physical features and makes,features uninterpretable, which motivates us to select but not transform features by LDA idea of preserving structure information of between-class and within-class for text categorization. In the paper; a new approach of feature selection based on Constrained LDA (CLDA) is proposed, which models feature selection as a search problem in subspace and finds optimal solution subject to some restrictions. Further; CLDA optimization problem is transformed into a process of scoring and sorting of features. Experiments on 20 Newsgroups and Reuters-21578 show that CLDA is consistently better than information gain and chit-test with lower computational complexity.
引用
收藏
页码:702 / +
页数:2
相关论文
共 50 条
  • [1] LDA-based Keyword Selection in Text Categorization
    Tasci, Serafettin
    Gungor, Tunga
    [J]. 2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 229 - 234
  • [2] Text Categorization Based on Clustering Feature Selection
    Zhou, Xiaofei
    Hu, Yue
    Guo, Li
    [J]. 2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 398 - 405
  • [3] Feature selection based on feature interactions with application to text categorization
    Tang, Xiaochuan
    Dai, Yuanshun
    Xiang, Yanping
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 120 : 207 - 216
  • [4] Feature subset selection in SOM based text categorization
    Bassiouny, S
    Nagi, M
    Hussein, MF
    [J]. IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 860 - 866
  • [5] Feature selection in SVM text categorization
    Taira, H
    Haruno, M
    [J]. SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 480 - 486
  • [6] Feature selection strategies for text categorization
    Soucy, P
    Mineau, GW
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 505 - 509
  • [7] A WordNet-based approach to feature selection in text categorization
    Zhang, K
    Sun, J
    Wang, B
    [J]. INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 475 - 484
  • [8] Feature Selection Method Based on Crossed Centroid for Text Categorization
    Yang, Jieming
    Liu, Zhiying
    Qu, Zhaoyang
    Wang, Jing
    [J]. 2014 15TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2014, : 11 - 15
  • [9] A New Approach of Feature Selection for Text Categorization
    CUI Zifeng~1
    2. Department of Computer Science and Engineering
    [J]. Wuhan University Journal of Natural Sciences, 2006, (05) : 1335 - 1339
  • [10] Normalized and classified feature selection in text categorization
    Wang, XJ
    Guo, J
    Zheng, KF
    [J]. INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2005, VOLS 1 AND 2, PROCEEDINGS, 2005, : 173 - 176