Fuzzy control GA with a novel hybrid semantic similarity strategy for text clustering

被引:16
|
作者
Song, Wei [1 ,2 ]
Liang, Jiu Zhen [1 ]
Park, Soon Cheol [2 ]
机构
[1] Jiangnan Univ, Sch Internet Things Engn, Wuxi 214122, Jiangsu, Peoples R China
[2] Chonbuk Natl Univ, Sch Informat & Commun Engn, Jeonju 561756, Jeonbuk, South Korea
基金
中国国家自然科学基金;
关键词
Clustering; WordNet; Hybrid semantic similarity; Fuzzy control; Evolutionary computation; Genetic algorithm; GENETIC ALGORITHM; OPTIMIZATION;
D O I
10.1016/j.ins.2014.03.024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a fuzzy control genetic algorithm (GA) in conjunction with a novel hybrid semantic similarity measure for document clustering. Since the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms being ignored, we use semantic similarity measures to solve this problem. In general, the semantic similarity measures can be extensively categorized into two kinds: thesaurus-based methods and corpus-based methods. However, in practice the corpus-based method is rather complicated to tackle. We propose and demonstrate a semantic space model (SSM) as the corpus-based method, where the appropriately reduced dimensions in SSM can capture the true relationship between documents in terms of concepts, rather than specific terms. Thus, the thesaurus-based method is combined with our SSM as a hybrid strategy to represent the semantic similarity measure. In GA field, the balance between the capability to converge to an optimum and the capacity to explore new solutions affects the success of search for the global optimum. We utilize a fuzzy control GA to adaptively adjust the influence between these two factors. Two textual data sets from Reuter document collection and 20-newsgroup corpus are tested in our experiments, and the results show that our fuzzy control GA combined with the hybrid semantic similarity strategy apparently outperforms the conventional GA, FCM and K-means with the traditional cosine similarity in VSM. Moreover, the superiorities of the fuzzy control GA and our hybrid semantic strategy are demonstrated by their better performance, in comparison with conventional GA with the same similarity measures. (c) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:156 / 170
页数:15
相关论文
共 50 条
  • [41] Novel distance and similarity measures on hesitant fuzzy sets with applications to clustering analysis
    Zhang, Xiaolu
    Xu, Zeshui
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2015, 28 (05) : 2279 - 2296
  • [42] A fuzzy clustering approach for finding similar documents using a novel similarity measure
    Saracoglu, Ridvan
    Tutuncu, Kemal
    Allahverdi, Novruz
    EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (03) : 600 - 605
  • [43] A Novel Hybrid ACO-GA Algorithm for Text Feature Selection
    Basiri, Mohammad Ehsan
    Nemati, Shahla
    2009 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-5, 2009, : 2561 - +
  • [44] A Novel Hybrid Method for Clustering Text Documents using Evolutionary Optimization
    Naderi, Muhammad
    Amiri, Maryam
    2023 13th International Conference on Computer and Knowledge Engineering, ICCKE 2023, 2023, : 369 - 374
  • [45] Frequent Term Based Text Document Clustering Using Similarity Measures: A Novel Approach
    Gupta, Vijay Kumar
    Dutta, Maitreyee
    Kumar, Manoj
    2017 FOURTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2017, : 164 - 169
  • [46] Optimized fuzzy control strategy for a spa hybrid truck
    A. Taghavipour
    M. S. Foumani
    International Journal of Automotive Technology, 2012, 13 : 817 - 824
  • [47] Hierarchical hybrid fuzzy strategy for column flotation control
    Nunez, Felipe
    Tapia, Luis
    Cipriano, Aldo
    MINERALS ENGINEERING, 2010, 23 (02) : 117 - 124
  • [48] OPTIMIZED FUZZY CONTROL STRATEGY FOR A SPA HYBRID TRUCK
    Taghavipour, A.
    Foumani, M. S.
    INTERNATIONAL JOURNAL OF AUTOMOTIVE TECHNOLOGY, 2012, 13 (05) : 817 - 824
  • [49] Hybrid clustering based fuzzy structure for vibration control - Part 1: A novel algorithm for building neuro-fuzzy system
    Sy Dzung Nguyen
    Quoc Hung Nguyen
    Choi, Seung-Bok
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2015, 50-51 : 510 - 525
  • [50] A Novel Text Clustering Method Based on TGSOM and Fuzzy K-Means
    Hu, Jinzhu
    Xiong, Chunxiu
    Shu, Jiangbo
    Zhou, Xing
    Zhu, Jun
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL I, 2009, : 26 - 30