An Active Learning Algorithm Based on Shannon Entropy for Constraint-Based Clustering

被引:4
|
作者
Chen, Duo Wen [1 ]
Jin, Ying Hua [1 ]
机构
[1] Guangdong Univ Technol, Sch Appl Math, Guangzhou 510520, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
基金
中国国家自然科学基金;
关键词
Clustering algorithms; Uncertainty; Entropy; Skeleton; Inspection; Measurement uncertainty; Semi-supervised clustering; pairwise constraint; active learning; entropy; skeleton set; CROSS-ENTROPY; K-MEANS; SELECTION;
D O I
10.1109/ACCESS.2020.3025036
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Pairwise constraints could enhance clustering performance in constraint-based clustering problems, especially when these pairwise constraints are informative. In this paper, a novel active learning pairwise constraint formulation algorithm would be constructed with aim to formulate informative pairwise constraints efficiently and economically. This algorithm consists of three phases: Selecting, Exploring and Consolidating. In Selecting phase, some type of unsupervised clustering algorithm is used to obtain an informative data set in terms of Shannon entropy. In Exploring phase, some type of farthest-first strategy is used to construct a series of query with aim to construct clustering skeleton set structure and informative pairwise constraints are also collected meanwhile based on the informative data set. If the number of skeleton sets equals the number of clusters, the new algorithm gets into third phase Consolidating; otherwise, it would finish. In Consolidating phase, non-skeleton points included in the informative data set are used to construct a series of query with skeleton set representative points constructed in Exploring phase. And some type of priority principle is proposed to help collect more must-link pairwise constraints. Treat the well-known MPCK-means (metric pairwise constrained K-means) as the underlying constraint-based semi-supervised clustering algorithm and data experiment comparison between this new algorithm and its counterparts would be done. Experiment outcome shows that significant improvement of this new algorithm.
引用
收藏
页码:171447 / 171456
页数:10
相关论文
共 50 条
  • [1] Active Learning Method for Constraint-Based Clustering Algorithms
    Cai, Lijun
    Yu, Tinghao
    He, Tingqin
    Chen, Lei
    Lin, Meiqi
    [J]. WEB-AGE INFORMATION MANAGEMENT, PT II, 2016, 9659 : 319 - 329
  • [2] Active Informative Pairwise Constraint Formulation Algorithm for Constraint-Based Clustering
    Zhong, Guoxiang
    Deng, Xiuqin
    Xu, Shengbing
    [J]. IEEE ACCESS, 2019, 7 : 81983 - 81993
  • [3] Combined Density-based and Constraint-based Algorithm for Clustering
    陈同孝
    陈荣昌
    林志强
    邱永兴
    [J]. Journal of Donghua University(English Edition), 2006, (06) : 36 - 38
  • [4] Constraint-based clustering selection
    Van Craenendonck, Toon
    Blockeel, Hendrik
    [J]. MACHINE LEARNING, 2017, 106 (9-10) : 1497 - 1521
  • [5] Constraint-based clustering selection
    Toon Van Craenendonck
    Hendrik Blockeel
    [J]. Machine Learning, 2017, 106 : 1497 - 1521
  • [6] Constraint-based query clustering
    Ruiz, Carlos
    Menasalvas, Ernestina
    Spiliopoulou, Myra
    [J]. ADVANCES IN INTELLIGENT WEB MASTERING, 2007, 43 : 304 - +
  • [7] Constraint-based clustering in large databases
    Tung, AKH
    Han, JW
    Lakshmanan, LVS
    Ng, RT
    [J]. DATABASE THEORY - ICDT 2001, PROCEEDINGS, 2001, 1973 : 405 - 419
  • [8] THE MAXIMUM ENTROPY PRINCIPLE: A GENERALIZED CONSTRAINT-BASED ENTROPY
    Chakrabarti, C. G.
    Chakrabarty, I.
    Ghosh, Koyel
    [J]. MODERN PHYSICS LETTERS B, 2009, 23 (13): : 1715 - 1721
  • [9] Complex Shannon Entropy Based Learning Algorithm and Its Applications
    Qian, Guobing
    Iu, Herbert H. C.
    Wang, Shiyuan
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (10) : 9673 - 9684
  • [10] Constraint-based Hierarchical Clustering for Time Sequences
    Kou, Yufeng
    Knackstedt, Chris
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2705 - 2711