A Feature Selection Methods Based on Concept Extraction and SOM Text Clustering Analysis

被引:0
|
作者
Wang, Lin [1 ]
Jiang, Minghu [2 ]
Liao, Shasha [2 ]
Lu, Yinghua [1 ]
机构
[1] Beijing Univ Post & Telecom, Sch Elect & Engn, Beijing 100876, Peoples R China
[2] Tsinghua Univ, Sch Humanities & Social Sci, Lab Computat Linguist, Beijing 100084, Peoples R China
关键词
Concept Attributes; Self-Organizing Map; Clustering; Text Classification;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The feature selection is an important part in automatic classification. In this paper, we use the HowNet to extract the concept attributes from word to build a feature set. However, as the concept defi4nition sometimes is too weak in expression, we set a shielded level in the sememe Tree and filter the concept attributes which can not give enough information for classification, and reserve the word whose definition is too weak in expression. By this method, we build a feature set composing of both sememes from the HowNet and the Chinese words. We also give different sememes different values according to their expression ability and relation to the word when we extract them from the word. After comparing the weight theories and classification precise, we give the CHI-MCOR weight method, which is derived from two normal methods. Then we use the Self-Organizing Map (SOM) to realize automatic text clustering. The experiment result shows that if we can extract the sememes properly, we can not only reduce the feature dimension but also improve the classification precise. The combined weight method makes a good balance between the fuzzy words which have a high occurrence and the dividing words which have a middle or low occurrence, and the classification precise is higher than other weight methods. SOM can be used in text clustering in large scales and the clustering results are good when the concept feature is selected. Between-cluster distance of the texts of concept features is bigger than that of texts of word features, word features data nevertheless exhibit some clusters.
引用
收藏
页码:20 / 28
页数:9
相关论文
共 50 条
  • [1] Feature subset selection in SOM based text categorization
    Bassiouny, S
    Nagi, M
    Hussein, MF
    [J]. IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 860 - 866
  • [2] Feature Extraction in Text Clustering Based on Theme
    Shi, Nianyun
    Jing, Kong
    Xu, Jiuyun
    Duan, Yongxiang
    Li, Chunhua
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION WORKSHOP: IITA 2008 WORKSHOPS, PROCEEDINGS, 2008, : 632 - +
  • [3] Text Categorization Based on Clustering Feature Selection
    Zhou, Xiaofei
    Hu, Yue
    Guo, Li
    [J]. 2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 398 - 405
  • [4] A comparative study on unsupervised feature selection methods for text clustering
    Liu, LY
    Kang, JC
    Yu, J
    Wang, ZL
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 597 - 601
  • [5] A NEW FEATURE SELECTION METHOD BASED ON CONCEPT EXTRACTION IN AUTOMATIC CHINESE TEXT CLASSIFICATION
    Liao, Shasha
    Jiang, Minghu
    [J]. NEW MATHEMATICS AND NATURAL COMPUTATION, 2007, 3 (03) : 331 - 347
  • [6] FSSOM: One novel SOM clustering algorithm based on feature selection
    Liu, Ming
    Liu, Yuan-Chao
    Wang, Xiao-Long
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 429 - 435
  • [7] Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering
    Bharti, Kusum Kumari
    Singh, Pramod Kumar
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (06) : 3105 - 3114
  • [8] Analysis and Evaluation of Feature Selection and Feature Extraction Methods
    Rubén E. Nogales
    Marco E. Benalcázar
    [J]. International Journal of Computational Intelligence Systems, 16
  • [9] Analysis and Evaluation of Feature Selection and Feature Extraction Methods
    Nogales, Ruben E.
    Benalcazar, Marco E.
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)
  • [10] Text Feature Extraction and Selection Based on Attention Mechanism
    Ma, Longxuan
    Zhang, Lei
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT II, 2019, 11440 : 615 - 627