PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping

被引：0

作者：

Zhang, Xiaomeng ^{[1
]}

Zhang, Hongtao ^{[2
]}

Wang, Zhihao ^{[2
]}

Ma, Xiaofei ^{[2
]}

Luo, Jiancheng ^{[2
]}

Zhu, Yingying ^{[3
]}

机构：

[1] Huazhong Univ Sci & Technol, Tongji Hosp, Tongji Med Coll, Dept Nephrol, Wuhan 430030, Hubei, Peoples R China

[2] Wuhan Univ, Sch Math & Stat, Wuhan 430070, Hubei, Peoples R China

[3] Huazhong Univ Sci & Technol, Tongji Hosp, Tongji Med Coll, Dept Oncol, Wuhan 430030, Hubei, Peoples R China

来源：

BMC BIOINFORMATICS | 2023年 / 24卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Hierarchical clustering; Polynomial weight; Consensus clustering; Sparse biomedical data; ALGORITHMS;

D O I：

10.1186/s12859-023-05595-4

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

BackgroundClustering analysis is widely used to interpret biomedical data and uncover new knowledge and patterns. However, conventional clustering methods are not effective when dealing with sparse biomedical data. To overcome this limitation, we propose a hierarchical clustering method called polynomial weight-adjusted sparse clustering (PWSC).ResultsThe PWSC algorithm adjusts feature weights using a polynomial function, redefines the distances between samples, and performs hierarchical clustering analysis based on these adjusted distances. Additionally, we incorporate a consensus clustering approach to determine the optimal number of classifications. This consensus approach utilizes relative change in the cumulative distribution function to identify the best number of clusters, resulting in more stable clustering results. Leveraging the PWSC algorithm, we successfully classified a cohort of gastric cancer patients, enabling categorization of patients carrying different types of altered genes. Further evaluation using Entropy showed a significant improvement (p = 2.905e-05), while using the Calinski-Harabasz index demonstrates a remarkable 100% improvement in the quality of the best classification compared to conventional algorithms. Similarly, significantly increased entropy (p = 0.0336) and comparable CHI, were observed when classifying another colorectal cancer cohort with microbial abundance. The above attempts in cancer subtyping demonstrate that PWSC is highly applicable to different types of biomedical data. To facilitate its application, we have developed a user-friendly tool that implements the PWSC algorithm, which canbe accessed at http://pwsc.aiyimed.com/.ConclusionsPWSC addresses the limitations of conventional approaches when clustering sparse biomedical data. By adjusting feature weights and employing consensus clustering, we achieve improved clustering results compared to conventional methods. The PWSC algorithm provides a valuable tool for researchers in the field, enabling more accurate and stable clustering analysis. Its application can enhance our understanding of complex biological systems and contribute to advancements in various biomedical disciplines.

引用

页数：17

共 25 条

[1] PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping
Xiaomeng Zhang
Hongtao Zhang
Zhihao Wang
Xiaofei Ma
Jiancheng Luo
Yingying Zhu
BMC Bioinformatics, 24
[2] Vector clustering analysis for sparse mixing data and its application
Cai, Rong-Tai
Wang, Yan-Jie
Xitong Fangzhen Xuebao / Journal of System Simulation, 2008, 20 (22): : 6029 - 6032
[3] Clustering on Sparse Data in Non-Overlapping Feature Space with Applications to Cancer Subtyping
Kang, Tianyu
Zarringhalam, Kourosh
Kuijjer, Marieke
Chen, Ping
Quackenbush, John
Ding, Wei
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1079 - 1084
[4] Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer
Qiang, Jipeng
Ding, Wei
Kuijjer, Marieke
Quackenbush, John
Chen, Ping
IEEE ACCESS, 2020, 8 : 67775 - 67789
[5] Time series clustering based on sparse subspace clustering algorithm and its application to daily box-office data analysis
Yan Wang
Yunian Ru
Jianping Chai
Neural Computing and Applications, 2019, 31 : 4809 - 4818
[6] Time series clustering based on sparse subspace clustering algorithm and its application to daily box-office data analysis
Wang, Yan
Ru, Yunian
Chai, Jianping
NEURAL COMPUTING & APPLICATIONS, 2019, 31 (09): : 4809 - 4818
[7] A Novel Joint Change Detection Approach Based on Weight-Clustering Sparse Autoencoders
Fan, Jianchao
Lin, Kai
Han, Min
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (02) : 685 - 699
[8] Initialized and guided EM-clustering of sparse binary data with application to text based documents
Kabán, A
Girolami, M
15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 744 - 747
[9] A novel approach for fMRI data analysis based on the combination of sparse approximation and affinity propagation clustering
Ren, Tianlong
Zeng, Weiming
Wang, Nizhuan
Chen, Lei
Wang, Chenglin
MAGNETIC RESONANCE IMAGING, 2014, 32 (06) : 736 - 746
[10] Sparse tree-based clustering of microbiome data to characterize microbiome heterogeneity in pancreatic cancer
Shi, Yushu
Zhang, Liangliang
Do, Kim-Anh
Jenq, Robert
Peterson, Christine B.
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2023, 72 (01) : 20 - 36

← 1 2 3 →