Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

被引:3
|
作者
Leng, Mingwei [1 ]
Cheng, Jianjun [1 ]
Wang, Jinjin [1 ]
Zhang, Zhengquan [2 ]
Zhou, Hanhai [1 ]
Chen, Xiaoyun [1 ]
机构
[1] Lanzhou Univ, Sch Informat Sci & Engn, Lanzhou 730000, Peoples R China
[2] Gansu Comp Ctr, Lanzhou 730000, Peoples R China
关键词
D O I
10.1155/2013/641927
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Semisupervised clustering algorithm combining SUBCLU and constrained clustering for detecting groups in high dimensional datasets
    Alexander Calvo-Valverde, Luis
    Vallejos-Pena, Alonso
    [J]. TECNOLOGIA EN MARCHA, 2018, 31 (03): : 74 - 85
  • [2] NETWORK CLUSTERING BY ADVANCED LABEL PROPAGATION ALGORITHM
    Zalik, Krista Rizman
    Zalik, Borut
    [J]. KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 444 - 447
  • [3] GABoost: A Clustering Based Undersampling Algorithm for Highly Imbalanced Datasets Using Genetic Algorithm
    Ajilisa, O. A.
    Jagathyraj, V. P.
    Sabu, M. K.
    [J]. INNOVATIONS IN BIO-INSPIRED COMPUTING AND APPLICATIONS, 2019, 939 : 235 - 246
  • [4] Active Learning for Imbalanced Datasets
    Aggarwal, Umang
    Popescu, Adrian
    Hudelot, Celine
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1417 - 1426
  • [5] A Novel Differential Evolution-Clustering Hybrid Resampling Algorithm on Imbalanced Datasets
    Chen, Leichen
    Cai, Zhihua
    Chen, Lu
    Gu, Qiong
    [J]. THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, : 81 - 85
  • [6] Label matrix normalization for semisupervised learning from imbalanced Data
    Li, Fengqi
    Li, Guangming
    Yang, Nanhai
    Xia, Feng
    Yu, Chuang
    [J]. NEW REVIEW OF HYPERMEDIA AND MULTIMEDIA, 2014, 20 (01) : 5 - 23
  • [7] An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets
    Zhang, Kang
    Gu, Xingsheng
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [8] Label Propagation Clustering Algorithm Based on Adaptive Angle
    Du, Hui
    Zhang, Manjie
    Wang, Zhihe
    Zhai, Qiaofeng
    Cao, Xuyan
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [9] dFC: A Data-density-aware Fuzzy Clustering Algorithm for Imbalanced Biomedical Datasets
    Wang, Jin
    You, Lei
    Fan, Wenjie
    Miao, Fang
    Yang, Tao
    [J]. PROCEEDINGS OF 2017 8TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2017), 2017, : 563 - 566
  • [10] Subspace clustering and label propagation for active feedback in image retrieval
    Qin, T
    Liu, TY
    Zhang, XD
    Ma, WY
    Zhang, HJ
    [J]. 11TH INTERNATIONAL MULTIMEDIA MODELLING CONFERENCE, PROCEEDINGS, 2005, : 172 - 179