Fuzzy control GA with a novel hybrid semantic similarity strategy for text clustering

被引:16
|
作者
Song, Wei [1 ,2 ]
Liang, Jiu Zhen [1 ]
Park, Soon Cheol [2 ]
机构
[1] Jiangnan Univ, Sch Internet Things Engn, Wuxi 214122, Jiangsu, Peoples R China
[2] Chonbuk Natl Univ, Sch Informat & Commun Engn, Jeonju 561756, Jeonbuk, South Korea
基金
中国国家自然科学基金;
关键词
Clustering; WordNet; Hybrid semantic similarity; Fuzzy control; Evolutionary computation; Genetic algorithm; GENETIC ALGORITHM; OPTIMIZATION;
D O I
10.1016/j.ins.2014.03.024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a fuzzy control genetic algorithm (GA) in conjunction with a novel hybrid semantic similarity measure for document clustering. Since the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms being ignored, we use semantic similarity measures to solve this problem. In general, the semantic similarity measures can be extensively categorized into two kinds: thesaurus-based methods and corpus-based methods. However, in practice the corpus-based method is rather complicated to tackle. We propose and demonstrate a semantic space model (SSM) as the corpus-based method, where the appropriately reduced dimensions in SSM can capture the true relationship between documents in terms of concepts, rather than specific terms. Thus, the thesaurus-based method is combined with our SSM as a hybrid strategy to represent the semantic similarity measure. In GA field, the balance between the capability to converge to an optimum and the capacity to explore new solutions affects the success of search for the global optimum. We utilize a fuzzy control GA to adaptively adjust the influence between these two factors. Two textual data sets from Reuter document collection and 20-newsgroup corpus are tested in our experiments, and the results show that our fuzzy control GA combined with the hybrid semantic similarity strategy apparently outperforms the conventional GA, FCM and K-means with the traditional cosine similarity in VSM. Moreover, the superiorities of the fuzzy control GA and our hybrid semantic strategy are demonstrated by their better performance, in comparison with conventional GA with the same similarity measures. (c) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:156 / 170
页数:15
相关论文
共 50 条
  • [31] Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis
    Zhou, Shenghan
    Xu, Xingxing
    Liu, Yinglai
    Chang, Runfeng
    Xiao, Yiyong
    IEEE ACCESS, 2019, 7 : 107247 - 107258
  • [32] Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm
    Skabar, Andrew
    Abdalgader, Khaled
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 62 - 75
  • [33] Novel fuzzy similarity measures and their applications in pattern recognition and clustering analysis
    Singh, Surender
    Singh, Koushal
    GRANULAR COMPUTING, 2023, 8 (06) : 1715 - 1737
  • [34] Novel fuzzy similarity measures and their applications in pattern recognition and clustering analysis
    Surender Singh
    Koushal Singh
    Granular Computing, 2023, 8 : 1715 - 1737
  • [35] Extractive Multi-document Text Summarization Leveraging Hybrid Semantic Similarity Measures
    Bandaru, Rajesh
    Radhika, Dr. Y.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 844 - 852
  • [36] Research on double fuzzy control strategy for parallel hybrid electric vehicle based on GA and DP optimisation
    Xu, Qiwei
    Luo, Xiaoxiao
    Jiang, Xiaobiao
    Zhao, Meng
    IET ELECTRICAL SYSTEMS IN TRANSPORTATION, 2018, 8 (02) : 144 - 151
  • [37] Semantic Similarity-Based Clustering of Web Documents Using Fuzzy C-Means
    Avanija, J.
    Ramar, K.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2015, 14 (03)
  • [38] Study on Classification System of Internet Text Based on Hybrid Fuzzy Clustering Theory
    Wang Xiaoyong
    Xiao Siyou
    Fang Yuefeng
    PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON IMAGE ANALYSIS & SIGNAL PROCESSING, 2009, 2009, : 131 - 135
  • [39] Experimental study on short-text clustering using transformer-based semantic similarity measure
    Abdalgader, Khaled
    Matroud, Atheer A.
    Hossin, Khaled
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [40] Experimental study on short-text clustering using transformer-based semantic similarity measure
    Abdalgader K.
    Matroud A.A.
    Hossin K.
    PeerJ Computer Science, 2024, 10