An Evolutionary Algorithm for Feature Selective Double Clustering of Text Documents

被引:0
|
作者
Nourashrafeddin, S. N. [1 ]
Milios, Evangelos [1 ]
Arnold, Dirk V. [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 4R2, Canada
关键词
Genetic algorithm; co-clustering; multiobjective optimization; text clustering; INFORMATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose FSDC, an evolutionary algorithm for Feature Selective Double Clustering of text documents. We first cluster the terms existing in the document corpus. The term clusters are then fed into multiobjective genetic algorithms to prune non- informative terms and form sets of keyterms representing topics. Based on the topic keyterms found, representative documents for each topic are extracted. These documents are then used as seeds to cluster all documents in the dataset. FSDC is compared to some well- known co- clusterers on real text datasets. The experimental results show that our algorithm can outperform the competitors.
引用
下载
收藏
页码:446 / 453
页数:8
相关论文
共 50 条
  • [21] A Framework for Medical Text Mining using a Feature Weighted Clustering Algorithm
    Chakrabarty, Anirban
    Roy, Santanu
    2013 1ST INTERNATIONAL CONFERENCE ON EMERGING TRENDS AND APPLICATIONS IN COMPUTER SCIENCE (ICETACS), 2013, : 135 - 139
  • [22] An Enhanced Feature Selection for Text Documents
    Thatha, Venkata Nagaraju
    Babu, A. Sudhir
    Haritha, D.
    SMART INTELLIGENT COMPUTING AND APPLICATIONS, VOL 2, 2020, 160 : 21 - 29
  • [23] Feature Weighted Clustering of Mixed Data Sets by Hybrid Evolutionary Algorithm
    Dutta, Dipankar
    Dutta, Paramartha
    Sil, Jaya
    2013 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2013,
  • [24] DOCUMENTS CLUSTERING USING QUANTUM CLUSTERING ALGORITHM
    Bhagawati, Rupam
    Laskar, Sahinur Rahman
    Swain, Bhagaban
    2016 INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING AND COMMUNICATIONS (MICROCOM), 2016,
  • [25] An evolutionary clustering algorithm
    Korzén, M
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2004, 2004, 3070 : 426 - 431
  • [26] Cloud-based clustering of text documents using the GHSOM algorithm on the GridGain platform
    Sarnovsky, M.
    Ulbrik, Z.
    2013 IEEE 8TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2013), 2013, : 309 - 313
  • [27] The Impacts of Singular Value Decomposition Algorithm Toward Indonesian Language Text Documents Clustering
    Jambak, Muhammad Ihsan
    Mohammed, Fathey
    Hidayati, Novita
    Efendi, Rusdi
    Primartha, Rifkie
    RECENT TRENDS IN DATA SCIENCE AND SOFT COMPUTING, IRICT 2018, 2019, 843 : 173 - 183
  • [28] Unsupervised Feature Selection Technique Based on Genetic Algorithm for Improving the Text Clustering
    Abualigah, Laith Mohammad
    Khader, Ahamad Tajudin
    Al-Betar, Mohammed Azmi
    2016 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSIT), 2016,
  • [29] A K-means Text Clustering Algorithm Based on Subject Feature Vector
    Duo, Ji
    Zhang, Peng
    Hao, Liu
    JOURNAL OF WEB ENGINEERING, 2021, 20 (06): : 1935 - 1946
  • [30] Pseudo-supervised clustering for text documents
    Maggini, M
    Rigutini, L
    Turchi, M
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 363 - 369