Implementing GloVe for Context Based k-means plus plus Clustering

被引:0
|
作者
Gupta, Akanksha [1 ]
Tripathy, B. K. [1 ]
机构
[1] VIT Univ, Sch Comp Sci & Engn, Vellore, Tamil Nadu, India
关键词
k-means plus; NLP; GloVe; t-SNE; Word Embedding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we have implemented a unique form of clustering that takes a non-numeric data set and clusters it with the help of the word embedding provided by the GloVe dataset. The related word embedding are generated for each of the items in the dataset we want to cluster using the GloVe vector representation of those words. We then perform dimensionality reduction on the data set to obtain the accurate number of dimensions to be taken for appropriate cluster formation. The data is then clustered using k-means++. This paper provides one of the ways to overcome the limitation of k-means clustering in terms of initialising the cluster centres and hence gives better quality clusters. With the synthetic examples, the k-means method does not perform well, because the random seeding inevitably merges clusters together, and the algorithm is unable to then split them apart. Careful seeding method used by k-means++ prevents this problem and hence usually gives optimal results even when datasets are synthetic.
引用
收藏
页码:1041 / 1046
页数:6
相关论文
共 50 条
  • [1] Global k-means plus plus : an effective relaxation of the global k-means clustering algorithm
    Vardakas, Georgios
    Likas, Aristidis
    APPLIED INTELLIGENCE, 2024, 54 (19) : 8876 - 8888
  • [2] Research on Clustering Routing Algorithm based on K-means plus plus for WSN
    Yang, Xiang
    Yan, Yu
    Deng, Dengteng
    PROCEEDINGS OF 2017 6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2017), 2017, : 330 - 333
  • [3] Collaborative annealing power k-means plus plus clustering
    Li, Hongzong
    Wang, Jun
    KNOWLEDGE-BASED SYSTEMS, 2022, 255
  • [4] Using k-Means plus plus Algorithm for Researchers Clustering
    Rukmi, Alvida Mustika
    Iqbal, Ikhwan Muhammad
    INTERNATIONAL CONFERENCE ON MATHEMATICS: PURE, APPLIED AND COMPUTATION: EMPOWERING ENGINEERING USING MATHEMATICS, 2017, 1867
  • [5] Improved Guarantees for k-means plus plus and k-means plus plus Parallel
    Makarychev, Konstantin
    Reddy, Aravind
    Shan, Liren
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] A Comparative Study of K-Means, K-Means plus plus and Fuzzy C-Means Clustering Algorithms
    Kapoor, Akanksha
    Singhal, Abhishek
    2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2017,
  • [7] Exact Acceleration of K-Means plus plus and K-Means∥
    Raff, Edward
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2928 - 2935
  • [8] Cuckoo, Bat and Krill Herd based k-means plus plus clustering algorithms
    Aggarwal, Shruti
    Singh, Paramvir
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 6): : 14169 - 14180
  • [9] Cuckoo and krill herd-based k-means plus plus hybrid algorithms for clustering
    Aggarwal, Shruti
    Singh, Paramvir
    EXPERT SYSTEMS, 2019, 36 (04)
  • [10] Clustering Analysis for Silent Telecom Customers Based on K-means plus
    Qiu, Yuhang
    Chen, Pingping
    Lin, Zhijian
    Yang, Yongcheng
    Zeng, Lanning
    Fan, Yaqi
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1023 - 1027