A Feature Selection Methods Based on Concept Extraction and SOM Text Clustering Analysis

被引:0
|
作者
Wang, Lin [1 ]
Jiang, Minghu [2 ]
Liao, Shasha [2 ]
Lu, Yinghua [1 ]
机构
[1] Beijing Univ Post & Telecom, Sch Elect & Engn, Beijing 100876, Peoples R China
[2] Tsinghua Univ, Sch Humanities & Social Sci, Lab Computat Linguist, Beijing 100084, Peoples R China
关键词
Concept Attributes; Self-Organizing Map; Clustering; Text Classification;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The feature selection is an important part in automatic classification. In this paper, we use the HowNet to extract the concept attributes from word to build a feature set. However, as the concept defi4nition sometimes is too weak in expression, we set a shielded level in the sememe Tree and filter the concept attributes which can not give enough information for classification, and reserve the word whose definition is too weak in expression. By this method, we build a feature set composing of both sememes from the HowNet and the Chinese words. We also give different sememes different values according to their expression ability and relation to the word when we extract them from the word. After comparing the weight theories and classification precise, we give the CHI-MCOR weight method, which is derived from two normal methods. Then we use the Self-Organizing Map (SOM) to realize automatic text clustering. The experiment result shows that if we can extract the sememes properly, we can not only reduce the feature dimension but also improve the classification precise. The combined weight method makes a good balance between the fuzzy words which have a high occurrence and the dividing words which have a middle or low occurrence, and the classification precise is higher than other weight methods. SOM can be used in text clustering in large scales and the clustering results are good when the concept feature is selected. Between-cluster distance of the texts of concept features is bigger than that of texts of word features, word features data nevertheless exhibit some clusters.
引用
收藏
页码:20 / 28
页数:9
相关论文
共 50 条
  • [41] A new unsupervised feature selection method for text clustering based on genetic algorithms
    Shamsinejadbabki, Pirooz
    Saraee, Mohammad
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 38 (03) : 669 - 684
  • [42] Multinomial mixture model with feature selection for text clustering
    Li, Minqiang
    Zhang, Liang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2008, 21 (07) : 704 - 708
  • [43] Text categorization using distributional clustering and concept extraction
    He, Yifan
    Jiang, Minghu
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF THEORETICAL AND METHODOLOGICAL ISSUES, 2007, 4681 : 720 - +
  • [44] Concept chain based text clustering
    Song, SX
    Zhang, JA
    Li, CP
    [J]. COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, 2005, 3801 : 713 - 720
  • [45] Research on Clustering Analysis based on SOM
    Liu, Haixue
    Yang, Ruijun
    Li, Wenju
    Yu, Wanjun
    Lu, Wei
    [J]. SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS II, PTS 1 AND 2, 2014, 475-476 : 968 - 971
  • [46] Multi-View Data Analysis and Concept Extraction Methods for Text
    Lamirel, Jean-Charles
    [J]. KNOWLEDGE ORGANIZATION, 2013, 40 (05): : 305 - 319
  • [47] Text clustering method based on genetic algorithm and SOM network
    Qin, Xiao
    Yuan, Changan
    [J]. Journal of Computational Information Systems, 2008, 4 (03): : 993 - 1000
  • [48] New Feature Selection Methods Based on Context Similarity for Text Categorization
    Chen, Yifei
    Han, Bingqing
    Hou, Ping
    [J]. 2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 598 - 604
  • [49] Comparison on Feature Selection Methods for Text Classification
    Liu, Wenkai
    Xiao, Jiongen
    Hong, Ming
    [J]. 2020 THE 4TH INTERNATIONAL CONFERENCE ON MANAGEMENT ENGINEERING, SOFTWARE ENGINEERING AND SERVICE SCIENCES (ICMSS 2020), 2020, : 82 - 86
  • [50] Combination of Feature Selection Methods for Text Categorisation
    Neumayer, Robert
    Mayer, Rudolf
    Norvag, Kjetil
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2011, 6611 : 763 - +