Integrative classification and analysis of multiple arrayCGH datasets with probe alignment

被引:6
|
作者
Ze Tian [1 ]
Rui Kuang [1 ]
机构
[1] Univ Minnesota Twin Cities, Dept Comp Sci & Engn, Minneapolis, MN USA
关键词
COPY NUMBER VARIATION; GENE-EXPRESSION; BLADDER-CANCER; HUMAN GENOME; CGH DATA; ALGORITHMS; MATRIX;
D O I
10.1093/bioinformatics/btq428
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Array comparative genomic hybridization (arrayCGH) is widely used to measure DNA copy numbers in cancer research. ArrayCGH data report log-ratio intensities of thousands of probes sampled along the chromosomes. Typically, the choices of the locations and the lengths of the probes vary in different experiments. This discrepancy in choosing probes poses a challenge in integrated classification or analysis across multiple arrayCGH datasets. We propose an alignment-based framework to integrate arrayCGH samples generated from different probe sets. The alignment framework seeks an optimal alignment between the probe series of one arrayCGH sample and the probe series of another sample, intended to find the maximum possible overlap of DNA copy number variations between the two measured chromosomes. An alignment kernel is introduced for integrative patient sample classification and a multiple alignment algorithm is also introduced for identifying common regions with copy number aberrations. Results: The probe alignment kernel and the MPA algorithm were experimented to integrate three bladder cancer datasets as well as artificial datasets. In the experiments, by integrating arrayCGH samples from multiple datasets, the probe alignment kernel used with support vector machines significantly improved patient sample classification accuracy over other baseline kernels. The experiments also demonstrated that the multiple probe alignment (MPA) algorithm can find common DNA aberrations that cannot be identified with the standard interpolation method. Furthermore, the MPA algorithm also identified many known bladder cancer DNA aberrations containing four known bladder cancer genes, three of which cannot be detected by interpolation.
引用
下载
收藏
页码:2313 / 2320
页数:8
相关论文
共 50 条
  • [21] Integrative genomics analysis identifies promising SNPs and genes implicated in tuberculosis risk based on multiple omics datasets
    Xu, Mengqiu
    Li, Jingjing
    Xiao, Zhaoying
    Lou, Jiongpo
    Pan, Xinrong
    Ma, Yunlong
    AGING-US, 2020, 12 (19): : 19173 - 19220
  • [22] Integrative genomics analysis identifies five promising genes implicated in insomnia risk based on multiple omics datasets
    Sun, Haozhen
    Zhang, Jianhua
    Ma, Yunlong
    Liu, Jingjing
    BIOSCIENCE REPORTS, 2020, 40
  • [23] Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification
    Lu, Nan
    Lei, Shida
    Niu, Gang
    Sato, Issei
    Sugiyama, Masashi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [24] Integrative learning of structured high-dimensional data from multiple datasets
    Chang, Changgee
    Dai, Zongyu
    Oh, Jihwan
    Long, Qi
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (02) : 120 - 134
  • [25] Analysis of Multiple Sarcoma Expression Datasets: Implications for Classification, Oncogenic Pathway Activation and Chemotherapy Resistance
    Konstantinopoulos, Panagiotis A.
    Fountzilas, Elena
    Goldsmith, Jeffrey D.
    Bhasin, Manoj
    Pillay, Kamana
    Francoeur, Nancy
    Libermann, Towia A.
    Gebhardt, Mark C.
    Spentzos, Dimitrios
    PLOS ONE, 2010, 5 (03):
  • [26] Using Multiple Coordinated Views for Multiple Datasets Analysis
    Guimaraes, Rafael Veras
    Carneiro, Nikolas Jorge S.
    Meiguins, Bianchi Serique
    Almeida, Leandro Hernandez
    Meiguins, Aruanda Simoes G.
    INFORMATION VISUALIZATION, IV 2009, PROCEEDINGS, 2009, : 627 - +
  • [27] Analysis of cancer datasets using Classification Algorithms
    Kumar, Parvesh
    Wasan, Siri Krishan
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2010, 10 (06): : 175 - 182
  • [28] A Comparative Analysis of Classification Algorithms on Diverse Datasets
    Alghobiri, Muhammad
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2018, 8 (02) : 2790 - 2795
  • [29] Classification and Analysis of Clustering Algorithms for Large Datasets
    Badase, P. S.
    Deshbhratar, G. P.
    Bhagat, A. P.
    2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [30] Appliance Classification Across Multiple High Frequency Energy Datasets
    Kahl, Matthias
    Kriechbaumer, Thomas
    Ul Haq, Anwar
    Jacobsen, Hans-Arno
    2017 IEEE INTERNATIONAL CONFERENCE ON SMART GRID COMMUNICATIONS (SMARTGRIDCOMM), 2017, : 147 - 152