Improved Constrained k-Means Algorithm for Clustering with Domain Knowledge

被引:5
|
作者
Huang, Peihuang [1 ]
Yao, Pei [2 ]
Hao, Zhendong [2 ]
Peng, Huihong [2 ]
Guo, Longkun [3 ]
机构
[1] Minjiang Univ, Coll Math & Data Sci, Fuzhou 350116, Peoples R China
[2] Fuzhou Univ, Coll Math & Comp Sci, Fuzhou 350116, Peoples R China
[3] Qilu Univ Technol, Sch Comp Sci, Jinan 250353, Peoples R China
关键词
constrained k-means; minimum weight matching; side information; domain knowledge; CRITERIA;
D O I
10.3390/math9192390
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Witnessing the tremendous development of machine learning technology, emerging machine learning applications impose challenges of using domain knowledge to improve the accuracy of clustering provided that clustering suffers a compromising accuracy rate despite its advantage of fast procession. In this paper, we model domain knowledge (i.e., background knowledge or side information), respecting some applications as must-link and cannot-link sets, for the sake of collaborating with k-means for better accuracy. We first propose an algorithm for constrained k-means, considering only must-links. The key idea is to consider a set of data points constrained by the must-links as a single data point with a weight equal to the weight sum of the constrained points. Then, for clustering the data points set with cannot-link, we employ minimum-weight matching to assign the data points to the existing clusters. At last, we carried out a numerical simulation to evaluate the proposed algorithms against the UCI datasets, demonstrating that our method outperforms the previous algorithms for constrained k-means as well as the traditional k-means regarding the clustering accuracy rate although with a slightly compromised practical runtime.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    [J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [2] An Improved K-means Clustering Algorithm
    Wang Yintong
    Li Wanlong
    Gao Rujia
    [J]. 2012 WORLD AUTOMATION CONGRESS (WAC), 2012,
  • [3] Improved K-means clustering algorithm
    Zhang, Zhe
    Zhang, Junxi
    Xue, Huifeng
    [J]. CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 5, PROCEEDINGS, 2008, : 169 - 172
  • [4] An improved K-means clustering algorithm
    Huang, Xiuchang
    Su, Wei
    [J]. Journal of Networks, 2014, 9 (01) : 161 - 167
  • [5] Improved Algorithm for the k-means Clustering
    Zhang, Sheng
    Wang, Shouqiang
    [J]. PROCEEDINGS OF THE 10TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2012), 2012, : 4717 - 4720
  • [6] Research on Improved K-means Clustering Algorithm
    Zhang, Yinsheng
    Shan, Huilin
    Li, Jiaqiang
    Zhou, Jie
    [J]. MEMS, NANO AND SMART SYSTEMS, PTS 1-6, 2012, 403-408 : 1977 - 1980
  • [7] An Improved Kernel K-means Clustering Algorithm
    Liu, Yang
    Yin, Hong Peng
    Chai, Yi
    [J]. PROCEEDINGS OF 2016 CHINESE INTELLIGENT SYSTEMS CONFERENCE, VOL I, 2016, 404 : 275 - 280
  • [8] An Improved K-means Algorithm for Document Clustering
    Wu, Guohua
    Lin, Hairong
    Fu, Ershuai
    Wang, Liuyang
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND MECHANICAL AUTOMATION (CSMA), 2015, : 65 - 69
  • [9] Clustering Using Boosted Constrained k-Means Algorithm
    Okabe, Masayuki
    Yamada, Seiji
    [J]. FRONTIERS IN ROBOTICS AND AI, 2018, 5
  • [10] A Clustering K-means Algorithm Based on Improved PSO Algorithm
    Tan, Long
    [J]. 2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 940 - 944