Improved fast partitional clustering algorithm for text clustering

被引:3
|
作者
Bejos, Sebastian [1 ,2 ]
Feliciano-Avelino, Ivan [1 ]
Martinez-Trinidad, J. Fco. [1 ]
Carrasco-Ochoa, J. A. [1 ]
机构
[1] Inst Nacl Astrofis Opt & Electr, Puebla, Mexico
[2] Univ Nacl Autonoma Mexico, Div Matemat & Ingn, Naucalpan, Mexico
关键词
Document clustering; large collection; high dimensionality; K-MEANS;
D O I
10.3233/JIFS-179879
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document clustering has become an important task for processing the big amount of textual information available on the Internet. On the other hand, k-means is the most widely used algorithm for clustering, mainly due to its simplicity and effectiveness. However, k-means becomes slow for large and high dimensional datasets, such as document collections. Recently the FPAC algorithm was proposed to mitigate this problem, but the improvement in the speed was reached at the cost of reducing the quality of the clustering results. For this reason, in this paper, we introduce an improved FPAC algorithm, which, according our experiments on different document collections, allows obtaining better clustering results than FPAC, without highly increasing the runtime.
引用
收藏
页码:2137 / 2145
页数:9
相关论文
共 50 条
  • [1] A Fast Partitional Clustering Algorithm based on Nearest Neighbours Heuristics
    Ganguly, Debasis
    [J]. PATTERN RECOGNITION LETTERS, 2018, 112 : 198 - 204
  • [2] An improved method of fuzzy clustering algorithm and its application in text clustering
    [J]. Jiang, H. (jdhsff@163.com), 1600, Binary Information Press, Flat F 8th Floor, Block 3, Tanner Garden, 18 Tanner Road, Hong Kong (10):
  • [3] An effective partitional clustering algorithm based on new clustering validity index
    Zhu, Erzhou
    Ma, Ruhui
    [J]. APPLIED SOFT COMPUTING, 2018, 71 : 608 - 621
  • [4] Using term dependency to enhance partitional text clustering
    Deng, WT
    Wu, W
    [J]. IKE '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGNINEERING, 2004, : 168 - 174
  • [5] A Partitional Clustering Algorithm for Crosscutting Concerns Identification
    Czibula, Gabriela
    Cojocar, Grigoreta Sofia
    Czibula, Istvan Gergely
    [J]. SEPADS'09: PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SOFTWARE ENGINEERING, PARALLEL AND DISTRIBUTED SYSTEMS, 2009, : 111 - 116
  • [6] Partitional clustering with a modified differential evolution algorithm
    Zhao Guangquan
    Peng Xiyuan
    Yang Ling
    [J]. ISTM/2007: 7TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-7, CONFERENCE PROCEEDINGS, 2007, : 6475 - 6478
  • [7] A partitional clustering algorithm validated by a clustering tendency index based on graph theory
    Silva, HB
    Brito, P
    da Costa, JP
    [J]. PATTERN RECOGNITION, 2006, 39 (05) : 776 - 788
  • [8] Novel partitional clustering algorithm for large data processing
    Lu, Zhi-Mao
    Feng, Jin-Mei
    Fan, Dong-Mei
    Yang, Peng
    Tian, Ye
    [J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2014, 36 (05): : 1010 - 1015
  • [9] EFFICIENT DENSITY-BASED PARTITIONAL CLUSTERING ALGORITHM
    Alamgir, Zareen
    Naveed, Hina
    [J]. COMPUTING AND INFORMATICS, 2021, 40 (06) : 1322 - 1344
  • [10] An Improved KNN Text Classification Algorithm Based on Clustering
    Zhou Yong
    Li Youwen
    Xia Shixiong
    [J]. JOURNAL OF COMPUTERS, 2009, 4 (03) : 230 - 237