A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics

被引：8

作者：

Liao, Longlong ^{[1
,2
]}

Li, Kenli ^{[3
]}

Li, Keqin ^{[4
]}

Yang, Canqun ^{[1
,2
]}

Tian, Qi ^{[5
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Sanyi Rd, Changsha, Hunan, Peoples R China

[2] State Key Lab High Performance Comp, Sanyi Rd, Changsha, Hunan, Peoples R China

[3] Hunan Univ, Coll Informat Sci & Engn, Lushan Rd, Changsha, Hunan, Peoples R China

[4] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA

[5] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX USA

来源：

BMC SYSTEMS BIOLOGY | 2018年 / 12卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Density clustering; Matrix completion; Unsupervised multiple kernel learning; Dimensionality reduction; Outlier detection; MATRIX COMPLETION; NORM;

D O I：

10.1186/s12918-018-0630-6

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background: While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user's experience. Results: The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results. Conclusions: Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics.

引用

页数：18

共 50 条

[31] An efficient and scalable density-based Clustering algorithm for datasets with complex structures
Lv, Yinghua
Ma, Tinghuai
Tang, Meili
Cao, Jie
Tian, Yuan
Al-Dhelaan, Abdullah
Al-Rodhaan, Mznah
[J]. NEUROCOMPUTING, 2016, 171 : 9 - 22
[32] Active Semi-Supervised Clustering Algorithm for Multi-Density Datasets
Atwa, Walid
Almazroi, Abdulwahab Ali
Aldhahr, Eman A.
Janbi, Nourah Fahad
[J]. International Journal of Advanced Computer Science and Applications, 2024, 15 (10) : 493 - 500
[33] Automatic Density Clustering with Multiple Kernels for High-Dimension Bioinformatics Data
Liao, Longlong
Li, Kenli
Li, Keqin
Tian, Qi
Yang, Canqun
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 2105 - 2112
[34] Density Peaks Clustering for Complex Datasets
Ruan, Shanshan
El-Ashram', Saeed
Ahmad, Waqas
Mahmood, Zahid
Mehmood, Rashid
[J]. 2016 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI), 2016, : 87 - 92
[35] A new kernel clustering algorithm
Borer, S
Gerstner, W
[J]. ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, : 2527 - 2531
[36] Clustering incomplete data using kernel-based fuzzy C-means algorithm
Zhang, DQ
Chen, SC
[J]. NEURAL PROCESSING LETTERS, 2003, 18 (03) : 155 - 162
[37] Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm
Dao-Qiang Zhang
Song-Can Chen
[J]. Neural Processing Letters, 2003, 18 : 155 - 162
[38] Dealing with incomplete datasets with a confidence attribution algorithm
Horstmann, Leonardo Passig
Wagner, Matheus
Scheffel, Roberto Milton
Frohlich, Antonio Augusto
[J]. MEASUREMENT, 2022, 199
[39] Incomplete multiview subspace clustering based on multiple kernel low-redundant representation learning
Li, Ao
Feng, Cong
Cheng, Yuan
Zhang, Yingtao
Yang, Hailu
[J]. INFORMATION FUSION, 2024, 103
[40] RECOME: A new density-based clustering algorithm using relative KNN kernel density
Geng, Yangli-ao
Li, Qingyong
Zheng, Rong
Zhuang, Fuzhen
He, Ruisi
Xiong, Naixue
[J]. INFORMATION SCIENCES, 2018, 436 : 13 - 30

← 1 2 3 4 5 →