Integrative classification and analysis of multiple arrayCGH datasets with probe alignment

被引:6
|
作者
Ze Tian [1 ]
Rui Kuang [1 ]
机构
[1] Univ Minnesota Twin Cities, Dept Comp Sci & Engn, Minneapolis, MN USA
关键词
COPY NUMBER VARIATION; GENE-EXPRESSION; BLADDER-CANCER; HUMAN GENOME; CGH DATA; ALGORITHMS; MATRIX;
D O I
10.1093/bioinformatics/btq428
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Array comparative genomic hybridization (arrayCGH) is widely used to measure DNA copy numbers in cancer research. ArrayCGH data report log-ratio intensities of thousands of probes sampled along the chromosomes. Typically, the choices of the locations and the lengths of the probes vary in different experiments. This discrepancy in choosing probes poses a challenge in integrated classification or analysis across multiple arrayCGH datasets. We propose an alignment-based framework to integrate arrayCGH samples generated from different probe sets. The alignment framework seeks an optimal alignment between the probe series of one arrayCGH sample and the probe series of another sample, intended to find the maximum possible overlap of DNA copy number variations between the two measured chromosomes. An alignment kernel is introduced for integrative patient sample classification and a multiple alignment algorithm is also introduced for identifying common regions with copy number aberrations. Results: The probe alignment kernel and the MPA algorithm were experimented to integrate three bladder cancer datasets as well as artificial datasets. In the experiments, by integrating arrayCGH samples from multiple datasets, the probe alignment kernel used with support vector machines significantly improved patient sample classification accuracy over other baseline kernels. The experiments also demonstrated that the multiple probe alignment (MPA) algorithm can find common DNA aberrations that cannot be identified with the standard interpolation method. Furthermore, the MPA algorithm also identified many known bladder cancer DNA aberrations containing four known bladder cancer genes, three of which cannot be detected by interpolation.
引用
收藏
页码:2313 / 2320
页数:8
相关论文
共 50 条
  • [1] Penalized integrative semiparametric interaction analysis for multiple genetic datasets
    Li, Yang
    Li, Rong
    Lin, Cunjie
    Qin, Yichen
    Ma, Shuangge
    [J]. STATISTICS IN MEDICINE, 2019, 38 (17) : 3221 - 3242
  • [2] Prior information-assisted integrative analysis of multiple datasets
    Wang, Feifei
    Liang, Dongzuo
    Li, Yang
    Ma, Shuangge
    [J]. BIOINFORMATICS, 2023, 39 (08)
  • [3] particleMDI: A Julia Package for the Integrative Cluster Analysis of Multiple Datasets
    Cunningham, Nathan
    Griffin, Jim E.
    Wild, David L.
    Lee, Anthony
    [J]. BAYESIAN STATISTICS AND NEW GENERATIONS, BAYSM 2018, 2019, 296 : 65 - 74
  • [4] Integrative Analysis of Multiple Cancer Prognosis Datasets Under the Heterogeneity Model
    Liu, Jin
    Huang, Jian
    Ma, Shuangge
    [J]. TOPICS IN APPLIED STATISTICS, 2013, 55 : 257 - 269
  • [5] Identification of mitochondrial disease genes through integrative analysis of multiple datasets
    Aiyar, Raeka S.
    Gagneur, Julien
    Steinmetz, Lars M.
    [J]. METHODS, 2008, 46 (04) : 248 - 255
  • [6] Sparse group penalized integrative analysis of multiple cancer prognosis datasets
    Liu, Jin
    Huang, Jian
    Xie, Yang
    Ma, Shuangge
    [J]. GENETICS RESEARCH, 2013, 95 (2-3) : 68 - 77
  • [7] Integrative analysis of multiple cancer genomic datasets under the heterogeneity model
    Liu, Jin
    Huang, Jian
    Ma, Shuangge
    [J]. STATISTICS IN MEDICINE, 2013, 32 (20) : 3509 - 3521
  • [8] Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
    Lin, Dongdong
    Zhang, Jigang
    Li, Jingyao
    He, Hao
    Deng, Hong-Wen
    Wang, Yu-Ping
    [J]. FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, 2014, 2
  • [9] InterSIM: Simulation tool for multiple integrative 'omic datasets'
    Chalise, Prabhakar
    Raghavan, Rama
    Fridley, Brooke L.
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2016, 128 : 69 - 74
  • [10] Scaling statistical multiple sequence alignment to large datasets
    Michael Nute
    Tandy Warnow
    [J]. BMC Genomics, 17