A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics

被引:8
|
作者
Liao, Longlong [1 ,2 ]
Li, Kenli [3 ]
Li, Keqin [4 ]
Yang, Canqun [1 ,2 ]
Tian, Qi [5 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Sanyi Rd, Changsha, Hunan, Peoples R China
[2] State Key Lab High Performance Comp, Sanyi Rd, Changsha, Hunan, Peoples R China
[3] Hunan Univ, Coll Informat Sci & Engn, Lushan Rd, Changsha, Hunan, Peoples R China
[4] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
[5] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX USA
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Density clustering; Matrix completion; Unsupervised multiple kernel learning; Dimensionality reduction; Outlier detection; MATRIX COMPLETION; NORM;
D O I
10.1186/s12918-018-0630-6
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user's experience. Results: The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results. Conclusions: Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] An efficient and scalable density-based Clustering algorithm for datasets with complex structures
    Lv, Yinghua
    Ma, Tinghuai
    Tang, Meili
    Cao, Jie
    Tian, Yuan
    Al-Dhelaan, Abdullah
    Al-Rodhaan, Mznah
    [J]. NEUROCOMPUTING, 2016, 171 : 9 - 22
  • [32] Active Semi-Supervised Clustering Algorithm for Multi-Density Datasets
    Atwa, Walid
    Almazroi, Abdulwahab Ali
    Aldhahr, Eman A.
    Janbi, Nourah Fahad
    [J]. International Journal of Advanced Computer Science and Applications, 2024, 15 (10) : 493 - 500
  • [33] Automatic Density Clustering with Multiple Kernels for High-Dimension Bioinformatics Data
    Liao, Longlong
    Li, Kenli
    Li, Keqin
    Tian, Qi
    Yang, Canqun
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 2105 - 2112
  • [34] Density Peaks Clustering for Complex Datasets
    Ruan, Shanshan
    El-Ashram', Saeed
    Ahmad, Waqas
    Mahmood, Zahid
    Mehmood, Rashid
    [J]. 2016 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI), 2016, : 87 - 92
  • [35] A new kernel clustering algorithm
    Borer, S
    Gerstner, W
    [J]. ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, : 2527 - 2531
  • [36] Clustering incomplete data using kernel-based fuzzy C-means algorithm
    Zhang, DQ
    Chen, SC
    [J]. NEURAL PROCESSING LETTERS, 2003, 18 (03) : 155 - 162
  • [37] Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm
    Dao-Qiang Zhang
    Song-Can Chen
    [J]. Neural Processing Letters, 2003, 18 : 155 - 162
  • [38] Dealing with incomplete datasets with a confidence attribution algorithm
    Horstmann, Leonardo Passig
    Wagner, Matheus
    Scheffel, Roberto Milton
    Frohlich, Antonio Augusto
    [J]. MEASUREMENT, 2022, 199
  • [39] Incomplete multiview subspace clustering based on multiple kernel low-redundant representation learning
    Li, Ao
    Feng, Cong
    Cheng, Yuan
    Zhang, Yingtao
    Yang, Hailu
    [J]. INFORMATION FUSION, 2024, 103
  • [40] RECOME: A new density-based clustering algorithm using relative KNN kernel density
    Geng, Yangli-ao
    Li, Qingyong
    Zheng, Rong
    Zhuang, Fuzhen
    He, Ruisi
    Xiong, Naixue
    [J]. INFORMATION SCIENCES, 2018, 436 : 13 - 30