Fuzzy control GA with a novel hybrid semantic similarity strategy for text clustering

被引:16
|
作者
Song, Wei [1 ,2 ]
Liang, Jiu Zhen [1 ]
Park, Soon Cheol [2 ]
机构
[1] Jiangnan Univ, Sch Internet Things Engn, Wuxi 214122, Jiangsu, Peoples R China
[2] Chonbuk Natl Univ, Sch Informat & Commun Engn, Jeonju 561756, Jeonbuk, South Korea
基金
中国国家自然科学基金;
关键词
Clustering; WordNet; Hybrid semantic similarity; Fuzzy control; Evolutionary computation; Genetic algorithm; GENETIC ALGORITHM; OPTIMIZATION;
D O I
10.1016/j.ins.2014.03.024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a fuzzy control genetic algorithm (GA) in conjunction with a novel hybrid semantic similarity measure for document clustering. Since the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms being ignored, we use semantic similarity measures to solve this problem. In general, the semantic similarity measures can be extensively categorized into two kinds: thesaurus-based methods and corpus-based methods. However, in practice the corpus-based method is rather complicated to tackle. We propose and demonstrate a semantic space model (SSM) as the corpus-based method, where the appropriately reduced dimensions in SSM can capture the true relationship between documents in terms of concepts, rather than specific terms. Thus, the thesaurus-based method is combined with our SSM as a hybrid strategy to represent the semantic similarity measure. In GA field, the balance between the capability to converge to an optimum and the capacity to explore new solutions affects the success of search for the global optimum. We utilize a fuzzy control GA to adaptively adjust the influence between these two factors. Two textual data sets from Reuter document collection and 20-newsgroup corpus are tested in our experiments, and the results show that our fuzzy control GA combined with the hybrid semantic similarity strategy apparently outperforms the conventional GA, FCM and K-means with the traditional cosine similarity in VSM. Moreover, the superiorities of the fuzzy control GA and our hybrid semantic strategy are demonstrated by their better performance, in comparison with conventional GA with the same similarity measures. (c) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:156 / 170
页数:15
相关论文
共 50 条
  • [1] Enhancing Text Clustering Performance Using Semantic Similarity
    Gad, Walaa K.
    Kamel, Mohamed S.
    ENTERPRISE INFORMATION SYSTEMS-BK, 2009, 24 : 325 - 335
  • [2] Self-adaptive GA, Quantitative Semantic Similarity Measures and Ontology-based Text Clustering
    Zhang, Chengzhi
    Song, Wei
    Li, Chenghua
    Yu, Wei
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 95 - +
  • [3] Improved Semantic Similarity Method Based on HowNet for Text Clustering
    Nie, Hongmei
    Zhou, Jiaqing
    Guo, Qi
    Huang, Zhiqi
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 266 - 269
  • [4] Research on text similarity algorithm based on sentence semantic clustering
    Zhang, J. (zhangjinpengyy1989@163.com), 1600, Binary Information Press (10):
  • [5] A Novel Discrimination Structure for Assessing Text Semantic Similarity
    Ding, Peng
    Liu, Dan
    Zhang, Zhiyuan
    Hu, Jie
    Liu, Ning
    JOURNAL OF INTERNET TECHNOLOGY, 2022, 23 (04): : 709 - 717
  • [6] Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus
    Atoum, Issa
    Otoom, Ahmed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (09) : 124 - 130
  • [7] A fuzzy approach for Persian text segmentation based on semantic similarity of sentences
    Shahabi, Amir Shahab
    Kangavari, Mohammad Reza
    INTELLIGENT INFORMATION PROCESSING III, 2006, 228 : 411 - +
  • [8] A modified ant-based text clustering algorithm with semantic similarity measure
    Haoxiang Xia
    Shuguang Wang
    Taketoshi Yoshida
    Journal of Systems Science and Systems Engineering, 2006, 15 : 474 - 492
  • [9] A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE
    Xia, Haoxiang
    Wang, Shuguang
    Yoshida, Taketoshi
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2006, 15 (04) : 474 - 492
  • [10] Intelligent Text Clustering Analysis of Novels Based on Digital Semantic Similarity Calculation
    Sun X.
    Computer-Aided Design and Applications, 2024, 21 (S16): : 199 - 213