Implementing GloVe for Context Based k-means plus plus Clustering

被引:0
|
作者
Gupta, Akanksha [1 ]
Tripathy, B. K. [1 ]
机构
[1] VIT Univ, Sch Comp Sci & Engn, Vellore, Tamil Nadu, India
来源
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT SUSTAINABLE SYSTEMS (ICISS 2017) | 2017年
关键词
k-means plus; NLP; GloVe; t-SNE; Word Embedding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we have implemented a unique form of clustering that takes a non-numeric data set and clusters it with the help of the word embedding provided by the GloVe dataset. The related word embedding are generated for each of the items in the dataset we want to cluster using the GloVe vector representation of those words. We then perform dimensionality reduction on the data set to obtain the accurate number of dimensions to be taken for appropriate cluster formation. The data is then clustered using k-means++. This paper provides one of the ways to overcome the limitation of k-means clustering in terms of initialising the cluster centres and hence gives better quality clusters. With the synthetic examples, the k-means method does not perform well, because the random seeding inevitably merges clusters together, and the algorithm is unable to then split them apart. Careful seeding method used by k-means++ prevents this problem and hence usually gives optimal results even when datasets are synthetic.
引用
收藏
页码:1041 / 1046
页数:6
相关论文
共 50 条
  • [21] Visual Clustering of Supply Chain via Collaborative Annealing Power K-means plus plus Clustering
    Luo, Naduo
    Tang, Chaosheng
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024, 2024, : 154 - 161
  • [22] K-MEANS plus : A DEVELOPED CLUSTERING ALGORITHM FOR BIG DATA
    Niu, Kun
    Gao, Zhipeng
    Jiao, Haizhen
    Deng, Nanjie
    PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 141 - 144
  • [23] Robust k-means plus
    Deshpande, Amit
    Kacham, Praneeth
    Pratap, Rameshwar
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 799 - 808
  • [24] k-variates plus plus : more pluses in the k-means plus
    Nock, Richard
    Canyasse, Raphael
    Boreli, Roksana
    Nielsen, Frank
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [25] CHILLING INJURY SEGMENTATION OF TOMATO LEAVES BASED ON FLUORESCENCE IMAGES AND IMPROVED K-MEANS plus plus CLUSTERING
    Dong, Z. F.
    Men, Y. H.
    Li, Z. M.
    Liu, Z. Z.
    Ji, J. W.
    TRANSACTIONS OF THE ASABE, 2021, 64 (01) : 13 - 22
  • [26] Real-time Flood Classification Forecasting Based on k-means plus plus Clustering and Neural Network
    Hu Caihong
    Zhang Xueli
    Li Changqing
    Liu Chengshuai
    Wang Jinxing
    Jian Shengqi
    WATER RESOURCES MANAGEMENT, 2022, 36 (01) : 103 - 117
  • [27] Method for Estimating the Composition Ratio of Substation Industry Based on K-Means plus plus Clustering and Sparse Coding
    Yin, Jiazhong
    Yang, Linze
    Zhang, Jiajun
    2022 IEEE/IAS INDUSTRIAL AND COMMERCIAL POWER SYSTEM ASIA (I&CPS ASIA 2022), 2022, : 1015 - 1022
  • [28] The Dynamic-Time-Warping-based k-means plus plus clustering and its application in phenoregion delineation
    Zhang, Yuan
    Hepner, George F.
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2017, 38 (06) : 1720 - 1736
  • [29] Modified Approach of Manufacturer's Power Curve Based on Improved Bins and K-Means plus plus Clustering
    Fang, Yuan
    Wang, Yibo
    Liu, Chuang
    Cai, Guowei
    SENSORS, 2022, 22 (21)
  • [30] k-means plus plus : The Advantages of Careful Seeding
    Arthur, David
    Vassilvitskii, Sergei
    PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2007, : 1027 - 1035