Medoidshift clustering applied to genomic bulk tumor data

被引:3
|
作者
Roman, Theodore [1 ,2 ]
Xie, Lu [1 ,2 ]
Schwartz, Russell [1 ,3 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Computat Biol Dept, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
[2] Joint Carnegie Mellon Univ Pittsburgh, PhD Program Computat Biol, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Mellon Coll Sci, Dept Biol Sci, 4400 Fifth Ave, Pittsburgh, PA 15213 USA
来源
BMC GENOMICS | 2016年 / 17卷
关键词
Computational biology; Clustering; Tumor; Heterogeneity; INTRATUMOR HETEROGENEITY; SEQUENCING REVEALS; GENETIC-ANALYSIS; WHOLE-GENOME; IN-SITU; CELL; EVOLUTION; EXPRESSION; MUTATIONS; CARCINOMA;
D O I
10.1186/s12864-015-2302-x
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Explanations of unsupervised learning clustering applied to data security analysis
    Corral, G.
    Armengol, E.
    Fornells, A.
    Golobardes, E.
    NEUROCOMPUTING, 2009, 72 (13-15) : 2754 - 2762
  • [42] A Clustering Algorithm for Polygonal Data Applied to Scientific Journal Profiles
    Silva, Wagner J. F.
    Souza, Pedro J. C.
    Souza, Renata M. C. R.
    Cysneiros, Francisco Jose A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13766 - 13777
  • [43] CLUSTERING APPLIED TO AN ORGANIC AIR-POLLUTANTS DATA SET
    SMEYERSVERBEKE, J
    DENHARTOG, JC
    DEKKER, WH
    MASSART, DL
    ANALUSIS, 1984, 12 (10) : 486 - 489
  • [44] Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data
    Xu, Da
    Zhang, Jialin
    Xu, Hanxiao
    Zhang, Yusen
    Chen, Wei
    Gao, Rui
    Dehmer, Matthias
    BMC GENOMICS, 2020, 21 (01)
  • [45] Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data
    Da Xu
    Jialin Zhang
    Hanxiao Xu
    Yusen Zhang
    Wei Chen
    Rui Gao
    Matthias Dehmer
    BMC Genomics, 21
  • [46] Biological Data Mining for Genomic Clustering Using Unsupervised Neural Learning
    Sen, Shreyas
    Narasimhan, Seetharam
    Konar, Amit
    ENGINEERING LETTERS, 2007, 14 (02)
  • [47] MGKA: A genetic algorithm-based clustering technique for genomic data
    Hung Nguyen
    Louis, Sushil J.
    Tin Nguyen
    2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, : 103 - 110
  • [48] Learning A Highly Resolved Tree of Phenotypes Using Genomic Data Clustering
    Feng, Yuanjian
    Miller, David J.
    Clarke, Robert
    Hoffman, Eric P.
    Wang, Yue
    BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 348 - 348
  • [49] Data clustering based on correlation analysis applied to highly variable domains
    Tosi, Stefania
    Casolari, Sara
    Colajanni, Michele
    COMPUTER NETWORKS, 2013, 57 (15) : 3025 - 3038
  • [50] Evaluation of Fitness Functions for Swarm Clustering Applied to Gene Expression Data
    Banu, P. K. Nizar
    Andrews, S.
    COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 3, 2015, 33