A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics

被引:8
|
作者
Liao, Longlong [1 ,2 ]
Li, Kenli [3 ]
Li, Keqin [4 ]
Yang, Canqun [1 ,2 ]
Tian, Qi [5 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Sanyi Rd, Changsha, Hunan, Peoples R China
[2] State Key Lab High Performance Comp, Sanyi Rd, Changsha, Hunan, Peoples R China
[3] Hunan Univ, Coll Informat Sci & Engn, Lushan Rd, Changsha, Hunan, Peoples R China
[4] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
[5] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX USA
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Density clustering; Matrix completion; Unsupervised multiple kernel learning; Dimensionality reduction; Outlier detection; MATRIX COMPLETION; NORM;
D O I
10.1186/s12918-018-0630-6
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user's experience. Results: The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results. Conclusions: Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Clustering on Multiple Incomplete Datasets via Collective Kernel Learning
    Shao, Weixiang
    Shi, Xiaoxiao
    Yu, Philip S.
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 1181 - 1186
  • [2] Incomplete Multiple Kernel Alignment Maximization for Clustering
    Liu, Xinwang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1412 - 1424
  • [3] Clustering Algorithm for Multi-density Datasets
    Fahim, Ahmed
    [J]. ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2019, 22 (3-4): : 244 - 258
  • [4] Multiple kernel learning for integrative consensus clustering of omic datasets
    Cabassi, Alessandra
    Kirk, Paul D. W.
    [J]. BIOINFORMATICS, 2020, 36 (18) : 4789 - 4796
  • [5] A clustering algorithm based on density kernel extension
    Dai, Wei-Di
    He, Pi-Lian
    Hou, Yue-Xian
    Kang, Xiao-Dong
    [J]. ADVANCES IN MACHINE LEARNING AND CYBERNETICS, 2006, 3930 : 189 - 198
  • [6] A Fast and Effective Multiple Kernel Clustering Method on Incomplete Data
    Xiang, Lingyun
    Zhao, Guohan
    Li, Qian
    Kim, Gwang-jun
    Alfarraj, Osama
    Tolba, Amr
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 67 (01): : 267 - 284
  • [7] A multi-kernel spectral clustering algorithm based on incomplete views
    Zhang, Wei
    Yang, Yan
    Hu, Jie
    [J]. DATA SCIENCE AND KNOWLEDGE ENGINEERING FOR SENSING DECISION SUPPORT, 2018, 11 : 477 - 484
  • [8] A new algorithm for clustering based on kernel density estimation
    Matioli, L. C.
    Santos, S. R.
    Kleina, M.
    Leite, E. A.
    [J]. JOURNAL OF APPLIED STATISTICS, 2018, 45 (02) : 347 - 366
  • [9] Cosine kernel based density peaks clustering algorithm
    Wang, Jiayuan
    Lv, Li
    Wu, Runxiu
    Fan, Tanghuai
    Lee, Ivan
    [J]. INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2020, 12 (01) : 1 - 20
  • [10] A Kernel Density Window Clustering Algorithm for Radar Pulses
    Lee, Dong-Weon
    Han, Jin-Woo
    Song, Kyu-Ha
    Lee, Won Don
    [J]. THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 1048 - 1053