Implementing GloVe for Context Based k-means plus plus Clustering

被引:0
|
作者
Gupta, Akanksha [1 ]
Tripathy, B. K. [1 ]
机构
[1] VIT Univ, Sch Comp Sci & Engn, Vellore, Tamil Nadu, India
关键词
k-means plus; NLP; GloVe; t-SNE; Word Embedding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we have implemented a unique form of clustering that takes a non-numeric data set and clusters it with the help of the word embedding provided by the GloVe dataset. The related word embedding are generated for each of the items in the dataset we want to cluster using the GloVe vector representation of those words. We then perform dimensionality reduction on the data set to obtain the accurate number of dimensions to be taken for appropriate cluster formation. The data is then clustered using k-means++. This paper provides one of the ways to overcome the limitation of k-means clustering in terms of initialising the cluster centres and hence gives better quality clusters. With the synthetic examples, the k-means method does not perform well, because the random seeding inevitably merges clusters together, and the algorithm is unable to then split them apart. Careful seeding method used by k-means++ prevents this problem and hence usually gives optimal results even when datasets are synthetic.
引用
收藏
页码:1041 / 1046
页数:6
相关论文
共 50 条
  • [31] Efficient k-Means plus plus Approximation with MapReduce
    Xu, Yujie
    Qu, Wenyu
    Li, Zhiyang
    Min, Geyong
    Li, Keqiu
    Liu, Zhaobin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (12) : 3135 - 3144
  • [32] Efficient k-means plus plus with Random Projection
    Chan, Jan Y. K.
    Leung, Alex Po
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 94 - 100
  • [33] Approximate K-Means plus plus in Sublinear Time
    Bachem, Olivier
    Lucic, Mario
    Hassani, S. Hamed
    Krause, Andreas
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1459 - 1467
  • [34] k-Means plus plus under approximation stability
    Agarwal, Manu
    Jaiswal, Ragesh
    Pal, Arindam
    THEORETICAL COMPUTER SCIENCE, 2015, 588 : 37 - 51
  • [35] Efficient portfolio construction by means of CVaR and k-means plus plus clustering analysis: Evidence from the NYSE
    Soleymani, Fazlollah
    Vasighi, Mahdi
    INTERNATIONAL JOURNAL OF FINANCE & ECONOMICS, 2022, 27 (03) : 3679 - 3693
  • [36] A new method for classifying nuts using image processing and k-means plus plus clustering
    Solak, Serdar
    Altinisik, Umut
    JOURNAL OF FOOD PROCESS ENGINEERING, 2018, 41 (07)
  • [37] A Quantum-inspired Particle Swarm Optimization K-means plus plus Clustering Algorithm
    Hua, Chun
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [38] A Hybrid K-Means plus plus and Particle Swarm Optimization Approach for Enhanced Document Clustering
    Hassan, Eisha
    Malik, Fazila
    Khan, Qazi Waqas
    Ahmad, Nadeem
    Sardaraz, Muhammad
    Karim, Faten Khalid
    Elmannai, Hela
    IEEE ACCESS, 2025, 13 : 48818 - 48840
  • [39] Spectral-spatial classification of hyperspectral images with k-means plus plus partitional clustering
    Kazanskiy, Nikolay L.
    Serafimovich, Pavel G.
    Zimichev, Evgeniy A.
    OPTICAL TECHNOLOGIES FOR TELECOMMUNICATIONS 2014, 2015, 9533
  • [40] Variance Based Data Fusion for K-Means plus
    Satish, V
    Kumar, Arun Raj P.
    2017 2ND INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2017, : 742 - 746