An ellipsoidal K-means for document clustering

被引:3
|
作者
Dzogang, Fabon [1 ]
Marsala, Christophe [1 ]
Lesot, Marie-Jeanne [1 ]
Rifqi, Maria [2 ,3 ]
机构
[1] Univ Paris 06, UMR7606, LIP6, Paris, France
[2] LIP6, Paris, France
[3] Univ Pantheon Assas, Paris, France
关键词
clustering; feature selection; spherical k-means; information retrieval;
D O I
10.1109/ICDM.2012.126
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an extension of the spherical K-means algorithm to deal with settings where the number of data points is largely inferior to the number of dimensions. We assume the data to lie in local and dense regions of the original space and we propose to embed each cluster into its specific ellipsoid. A new objective function is introduced, analytical solutions are derived for both the centroids and the associated ellipsoids. Furthermore, a study on the complexity of this algorithm highlights that it is of same order as the regular K-means algorithm. Results on both synthetic and real data show the efficiency of the proposed method.
引用
收藏
页码:221 / 230
页数:10
相关论文
共 50 条
  • [1] Harmony K-means algorithm for document clustering
    Mahdavi, Mehrdad
    Abolhassani, Hassan
    DATA MINING AND KNOWLEDGE DISCOVERY, 2009, 18 (03) : 370 - 391
  • [2] An Improved K-means Algorithm for Document Clustering
    Wu, Guohua
    Lin, Hairong
    Fu, Ershuai
    Wang, Liuyang
    2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND MECHANICAL AUTOMATION (CSMA), 2015, : 65 - 69
  • [3] Harmony K-means algorithm for document clustering
    Mehrdad Mahdavi
    Hassan Abolhassani
    Data Mining and Knowledge Discovery, 2009, 18 : 370 - 391
  • [4] Efficient Sparse Spherical k-Means for Document Clustering
    Knittel, Johannes
    Koch, Steffen
    Ertl, Thomas
    PROCEEDINGS OF THE 21ST ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG '21), 2021,
  • [5] Text Document Clustering Based on Density K-means
    Wu, Di
    Zeng, Yan
    Qu, Yin-chuan
    INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,
  • [6] K-means based method for overlapping document clustering
    Beltran, Beatriz
    Vilarino, Darnes
    Martinez-Trinidad, Jose Fco.
    Carrasco-Ochoa, J. A.
    Pinto, David
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2127 - 2135
  • [7] Improved Document Clustering using K-means Algorithm
    Bide, Pramod
    Shedge, Rajashree
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [8] Document Clustering - A Feasible Demonstration with K-means Algorithm
    Arif, Wajiha
    Mahoto, Naeem Ahmed
    2019 2ND INTERNATIONAL CONFERENCE ON COMPUTING, MATHEMATICS AND ENGINEERING TECHNOLOGIES (ICOMET), 2019,
  • [9] An Improved Hierarchical K-Means Algorithm for Web Document Clustering
    Liu, Yongxin
    Liu, Zhijng
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 606 - 610
  • [10] An Approach for Document Clustering using PSO and K-means Algorithm
    Chouhan, Rashmi
    Purohit, Anuradha
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2018), 2018, : 1380 - 1384