An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining

被引:1
|
作者
Miloudi, Salim [1 ]
Wang, Yulin [1 ]
Ding, Wenjia [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
关键词
coordinate descent; clustering; multi-database mining; fuzziness; binary entropy loss; similarity matrix; CLASSIFICATION; RULES;
D O I
10.3390/e23050553
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Clustering algorithms for multi-database mining (MDM) rely on computing (n2-n)/2 pairwise similarities between n multiple databases to generate and evaluate m is an element of[1,(n2-n)/2] candidate clusterings in order to select the ideal partitioning that optimizes a predefined goodness measure. However, when these pairwise similarities are distributed around the mean value, the clustering algorithm becomes indecisive when choosing what database pairs are considered eligible to be grouped together. Consequently, a trivial result is produced by putting all the n databases in one cluster or by returning n singleton clusters. To tackle the latter problem, we propose a learning algorithm to reduce the fuzziness of the similarity matrix by minimizing a weighted binary entropy loss function via gradient descent and back-propagation. As a result, the learned model will improve the certainty of the clustering algorithm by correctly identifying the optimal database clusters. Additionally, in contrast to gradient-based clustering algorithms, which are sensitive to the choice of the learning rate and require more iterations to converge, we propose a learning-rate-free algorithm to assess the candidate clusterings generated on the fly in fewer upper-bounded iterations. To achieve our goal, we use coordinate descent (CD) and back-propagation to search for the optimal clustering of the n multiple database in a way that minimizes a convex clustering quality measure L(theta) in less than (n2-n)/2 iterations. By using a max-heap data structure within our CD algorithm, we optimally choose the largest weight variable theta p,q(i) at each iteration i such that taking the partial derivative of L(theta) with respect to theta p,q(i) allows us to attain the next steepest descent minimizing L(theta) without using a learning rate. Through a series of experiments on multiple database samples, we show that our algorithm outperforms the existing clustering algorithms for MDM.
引用
收藏
页数:37
相关论文
共 50 条
  • [1] An Improved Database Classification Algorithm for Multi-database Mining
    Li, Hong
    Hu, XueGang
    Zhang, YanMing
    [J]. FRONTIERS IN ALGORITHMICS, PROCEEDINGS, 2009, 5598 : 346 - +
  • [2] A Gradient-Based Clustering for Multi-Database Mining
    Miloudi, Salim
    Wang, Yulin
    Ding, Wenjia
    [J]. IEEE ACCESS, 2021, 9 : 11144 - 11172
  • [3] An Optimized Graph-based Clustering for Multi-database Mining
    Miloudi, Salim
    Wang, Yulin
    Ding, Wenjia
    [J]. 2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 807 - 812
  • [4] Multi-database mining
    Siadaty, Mir S.
    Harrison, James H.
    [J]. CLINICS IN LABORATORY MEDICINE, 2008, 28 (01) : 73 - +
  • [5] A Study of Negative Association Rules Mining Algorithm Based on Multi-Database
    Peng, Xushan
    Cheng, Ping
    Wang, Maoji
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION APPLICATIONS (ICCIA 2012), 2012, : 1658 - 1661
  • [6] Database classification for multi-database mining
    Wu, XD
    Zhang, CQ
    Zhang, SC
    [J]. INFORMATION SYSTEMS, 2005, 30 (01) : 71 - 88
  • [7] A Similarity-Based Clustering Algorithm for Fuzzy Data
    Hung, Wen-Liang
    Yang, Miin-Shen
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [8] A similarity-based soft clustering algorithm for documents
    Lin, KI
    Kondadadi, R
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2001, : 40 - 47
  • [9] An Improved Group Similarity-Based Association Rule Mining Algorithm in Complex Scenes
    Duan, Guiduo
    Wang, Xiaotong
    Huang, Tianxi
    Kurths, Jurgen
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (02)
  • [10] Subspace Similarity-based Algorithm for Combine Multiple Clustering
    Xu, Sen
    Li, Xianfeng
    Chen, Rong
    Wu, Shuang
    Ni, Jun
    [J]. 2013 SEVENTH INTERNATIONAL CONFERENCE ON INTERNET COMPUTING FOR ENGINEERING AND SCIENCE (ICICSE 2013), 2013, : 69 - 76