Distance-based clustering of CGH data

被引:54
|
作者
Liu, Jun [1 ]
Mohammed, Jaaved
Carter, James
Ranka, Sanjay
Kahveci, Tamer
Baudis, Michael
机构
[1] Univ Florida, Gainesville, FL 32611 USA
[2] Rhein Westfal TH Aachen, Inst Humangenet, D-5100 Aachen, Germany
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btl185
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: We consider the problem of clustering a population of Comparative Genomic Hybridization (CGH) data samples. The goal is to develop a systematic way of placing patients with similar CGH imbalance profiles into the same cluster. Our expectation is that patients with the same cancer types will generally belong to the same cluster as their underlying CGH profiles will be similar. Results: We focus on distance-based clustering strategies. We do this in two steps. (1) Distances of all pairs of CGH samples are computed. (2) CGH samples are clustered based on this distance. We develop three pairwise distance/similarity measures, namely raw, cosine and sim. Raw measure disregards correlation between contiguous genomic intervals. It compares the aberrations in each genomic interval separately. The remaining measures assume that consecutive genomic intervals may be correlated. Cosine maps pairs of CGH samples into vectors in a high-dimensional space and measures the angle between them. Sim measures the number of independent common aberrations. We test our distance/similarity measures on three well known clustering algorithms, bottom-up, top-down and k-means with and without centroid shrinking. Our results show that sim consistently performs better than the remaining measures. This indicates that the correlation of neighboring genomic intervals should be considered in the structural analysis of CGH datasets. The combination of sim with top-down clustering emerged as the best approach.
引用
收藏
页码:1971 / 1978
页数:8
相关论文
共 50 条
  • [1] Distance-based clustering of mixed data
    van de Velden, Michel
    D'Enza, Alfonso Iodice
    Markos, Angelos
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2019, 11 (03)
  • [2] Distance-Based Random Forest Clustering with Missing Data
    Raniero, Matteo
    Bicego, Manuele
    Cicalese, Ferdinando
    [J]. IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT III, 2022, 13233 : 121 - 132
  • [3] A Kemeny Distance-Based Robust Fuzzy Clustering for Preference Data
    Pierpaolo D’Urso
    Vincenzina Vitale
    [J]. Journal of Classification, 2022, 39 : 600 - 647
  • [4] A Kemeny Distance-Based Robust Fuzzy Clustering for Preference Data
    D'Urso, Pierpaolo
    Vitale, Vincenzina
    [J]. JOURNAL OF CLASSIFICATION, 2022, 39 (03) : 600 - 647
  • [5] Robust distance-based clustering with applications to spatial data mining
    Estivill-Castro, V
    Houle, ME
    [J]. ALGORITHMICA, 2001, 30 (02) : 216 - 242
  • [6] Robust Distance-Based Clustering with Applications to Spatial Data Mining
    V. Estivill-Castro
    M. E. Houle
    [J]. Algorithmica, 2001, 30 : 216 - 242
  • [7] Hierarchical Distance-Based Conceptual Clustering
    Funes, A.
    Ferri, C.
    Hernandez-Orallo, J.
    Ramirez-Quintana, M. J.
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART I, PROCEEDINGS, 2008, 5211 : 349 - +
  • [8] Exponential distance-based fuzzy clustering for interval-valued data
    D'Urso, Pierpaolo
    Massari, Riccardo
    De Giovanni, Livia
    Cappelli, Carmela
    [J]. FUZZY OPTIMIZATION AND DECISION MAKING, 2017, 16 (01) : 51 - 70
  • [9] A spatial distance-based spatial clustering algorithm for sparse image data
    Zhang, Tian -fan
    Li, Zhe
    Yuan, Qi
    Wang, You-ning
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2022, 61 (12) : 12609 - 12622
  • [10] Exponential distance-based fuzzy clustering for interval-valued data
    Pierpaolo D’Urso
    Riccardo Massari
    Livia De Giovanni
    Carmela Cappelli
    [J]. Fuzzy Optimization and Decision Making, 2017, 16 : 51 - 70