PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping

被引:0
|
作者
Zhang, Xiaomeng [1 ]
Zhang, Hongtao [2 ]
Wang, Zhihao [2 ]
Ma, Xiaofei [2 ]
Luo, Jiancheng [2 ]
Zhu, Yingying [3 ]
机构
[1] Huazhong Univ Sci & Technol, Tongji Hosp, Tongji Med Coll, Dept Nephrol, Wuhan 430030, Hubei, Peoples R China
[2] Wuhan Univ, Sch Math & Stat, Wuhan 430070, Hubei, Peoples R China
[3] Huazhong Univ Sci & Technol, Tongji Hosp, Tongji Med Coll, Dept Oncol, Wuhan 430030, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Hierarchical clustering; Polynomial weight; Consensus clustering; Sparse biomedical data; ALGORITHMS;
D O I
10.1186/s12859-023-05595-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundClustering analysis is widely used to interpret biomedical data and uncover new knowledge and patterns. However, conventional clustering methods are not effective when dealing with sparse biomedical data. To overcome this limitation, we propose a hierarchical clustering method called polynomial weight-adjusted sparse clustering (PWSC).ResultsThe PWSC algorithm adjusts feature weights using a polynomial function, redefines the distances between samples, and performs hierarchical clustering analysis based on these adjusted distances. Additionally, we incorporate a consensus clustering approach to determine the optimal number of classifications. This consensus approach utilizes relative change in the cumulative distribution function to identify the best number of clusters, resulting in more stable clustering results. Leveraging the PWSC algorithm, we successfully classified a cohort of gastric cancer patients, enabling categorization of patients carrying different types of altered genes. Further evaluation using Entropy showed a significant improvement (p = 2.905e-05), while using the Calinski-Harabasz index demonstrates a remarkable 100% improvement in the quality of the best classification compared to conventional algorithms. Similarly, significantly increased entropy (p = 0.0336) and comparable CHI, were observed when classifying another colorectal cancer cohort with microbial abundance. The above attempts in cancer subtyping demonstrate that PWSC is highly applicable to different types of biomedical data. To facilitate its application, we have developed a user-friendly tool that implements the PWSC algorithm, which canbe accessed at http://pwsc.aiyimed.com/.ConclusionsPWSC addresses the limitations of conventional approaches when clustering sparse biomedical data. By adjusting feature weights and employing consensus clustering, we achieve improved clustering results compared to conventional methods. The PWSC algorithm provides a valuable tool for researchers in the field, enabling more accurate and stable clustering analysis. Its application can enhance our understanding of complex biological systems and contribute to advancements in various biomedical disciplines.
引用
收藏
页数:17
相关论文
共 25 条
  • [1] PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping
    Xiaomeng Zhang
    Hongtao Zhang
    Zhihao Wang
    Xiaofei Ma
    Jiancheng Luo
    Yingying Zhu
    BMC Bioinformatics, 24
  • [2] Vector clustering analysis for sparse mixing data and its application
    Cai, Rong-Tai
    Wang, Yan-Jie
    Xitong Fangzhen Xuebao / Journal of System Simulation, 2008, 20 (22): : 6029 - 6032
  • [3] Clustering on Sparse Data in Non-Overlapping Feature Space with Applications to Cancer Subtyping
    Kang, Tianyu
    Zarringhalam, Kourosh
    Kuijjer, Marieke
    Chen, Ping
    Quackenbush, John
    Ding, Wei
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1079 - 1084
  • [4] Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer
    Qiang, Jipeng
    Ding, Wei
    Kuijjer, Marieke
    Quackenbush, John
    Chen, Ping
    IEEE ACCESS, 2020, 8 : 67775 - 67789
  • [5] Time series clustering based on sparse subspace clustering algorithm and its application to daily box-office data analysis
    Yan Wang
    Yunian Ru
    Jianping Chai
    Neural Computing and Applications, 2019, 31 : 4809 - 4818
  • [6] Time series clustering based on sparse subspace clustering algorithm and its application to daily box-office data analysis
    Wang, Yan
    Ru, Yunian
    Chai, Jianping
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (09): : 4809 - 4818
  • [7] A Novel Joint Change Detection Approach Based on Weight-Clustering Sparse Autoencoders
    Fan, Jianchao
    Lin, Kai
    Han, Min
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (02) : 685 - 699
  • [8] Initialized and guided EM-clustering of sparse binary data with application to text based documents
    Kabán, A
    Girolami, M
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 744 - 747
  • [9] A novel approach for fMRI data analysis based on the combination of sparse approximation and affinity propagation clustering
    Ren, Tianlong
    Zeng, Weiming
    Wang, Nizhuan
    Chen, Lei
    Wang, Chenglin
    MAGNETIC RESONANCE IMAGING, 2014, 32 (06) : 736 - 746
  • [10] Sparse tree-based clustering of microbiome data to characterize microbiome heterogeneity in pancreatic cancer
    Shi, Yushu
    Zhang, Liangliang
    Do, Kim-Anh
    Jenq, Robert
    Peterson, Christine B.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2023, 72 (01) : 20 - 36