Multi-Omic Data Improve Prediction of Personalized Tumor Suppressors and Oncogenes

被引:0
|
作者
Sudhakar, Malvika [1 ,2 ,3 ]
Rengaswamy, Raghunathan [1 ,2 ,4 ]
Raman, Karthik [1 ,2 ,3 ]
机构
[1] Indian Inst Technol IIT Madras, Ctr Integrat Biol & Syst Med IBSE, Chennai, India
[2] IIT Madras, Robert Bosch Ctr Data Sci & Artificial Intelligenc, Chennai, India
[3] IIT Madras, Bhupat & Jyoti Mehta Sch Biosci, Dept Biotechnol, Chennai, India
[4] IIT Madras, Dept Chem Engn, Chennai, India
关键词
machine learning; driver genes; personalized driver genes; cancer genomics; PIVOT; DIFFERENTIAL EXPRESSION ANALYSIS; BREAST-CANCER; MUTATIONAL PROCESSES; DRIVER GENES; HETEROGENEITY; SIGNATURES; EVOLUTION; RECEPTOR; SOX9;
D O I
10.3389/fgene.2022.854190
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The progression of tumorigenesis starts with a few mutational and structural driver events in the cell. Various cohort-based computational tools exist to identify driver genes but require multiple samples to identify less frequently mutated driver genes. Many studies use different methods to identify driver mutations/genes from mutations that have no impact on tumor progression; however, a small fraction of patients show no mutational events in any known driver genes. Current unsupervised methods map somatic and expression data onto a network to identify personalized driver genes based on changes in expression. Our method is the first machine learning model to classify genes as tumor suppressor gene (TSG), oncogene (OG), or neutral, thus assigning the functional impact of the gene in the patient. In this study, we develop a multi-omic approach, PIVOT (Personalized Identification of driVer OGs and TSGs), to train on experimentally or computationally validated mutational and structural driver events. Given the lack of any gold standards for the identification of personalized driver genes, we label the data using four strategies and, based on classification metrics, show gene-based labeling strategies perform best. We build different models using SNV, RNA, and multi-omic features to be used based on the data available. Our models trained on multi-omic data improved predictions compared with mutation and expression data, achieving an accuracy & GE; 0.99 for BRCA, LUAD, and COAD datasets. We show network and expression-based features contribute the most to PIVOT. Our predictions on BRCA, COAD, and LUAD cancer types reveal commonly altered genes such as TP53 and PIK3CA, which are predicted drivers for multiple cancer types. Along with known driver genes, our models also identify new driver genes such as PRKCA, SOX9, and PSMD4. Our multi-omic model labels both CNV and mutations with a more considerable contribution by CNV alterations. While predicting labels for genes mutated in multiple samples, we also label rare driver events occurring in as few as one sample. We also identify genes with dual roles within the same cancer type. Overall, PIVOT labels personalized driver genes as TSGs and OGs and also identified rare driver genes.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Combinatorial risk scores: Personalized multi-omic prediction of disease risk
    Uckun, S.
    [J]. CANCER RESEARCH, 2019, 79 (04)
  • [2] Bayesian simultaneous factorization and prediction using multi-omic data
    Samorodnitsky, Sarah
    Wendt, Chris H.
    Lock, Eric F.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 197
  • [3] Genetic prediction of multi-omic traits
    Koch, Linda
    [J]. NATURE REVIEWS GENETICS, 2023, 24 (06) : 346 - 346
  • [4] Genetic prediction of multi-omic traits
    Linda Koch
    [J]. Nature Reviews Genetics, 2023, 24 : 346 - 346
  • [5] Editorial: Multi-omic data integration
    Nardini, Christine
    Dent, Jennifer
    Tieri, Paolo
    [J]. FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, 2015, 3
  • [6] The challenges of integrating multi-omic data sets
    Palsson, Bernhard
    Zengler, Karsten
    [J]. NATURE CHEMICAL BIOLOGY, 2010, 6 (11) : 787 - 789
  • [7] Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival
    Daniele Ramazzotti
    Avantika Lal
    Bo Wang
    Serafim Batzoglou
    Arend Sidow
    [J]. Nature Communications, 9
  • [8] Multi-omic data analysis using Galaxy
    Boekel, Jorrit
    Chilton, John M.
    Cooke, Ira R.
    Horvatovich, Peter L.
    Jagtap, Pratik D.
    Kall, Lukas
    Lehtio, Janne
    Lukasse, Pieter
    Moerland, Perry D.
    Griffin, Timothy J.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (02) : 137 - 139
  • [9] Characterizing Multi-omic Data in Systems Biology
    Mason, Christopher E.
    Porter, Sandra G.
    Smith, Todd M.
    [J]. SYSTEMS ANALYSIS OF HUMAN MULTIGENE DISORDERS, 2014, 799 : 15 - 38
  • [10] Cancer driver mutation prediction through Bayesian integration of multi-omic data
    Wang, Zixing
    Ng, Kwok-Shing
    Chen, Tenghui
    Kim, Tae-Beom
    Wang, Fang
    Shaw, Kenna
    Scott, Kenneth L.
    Meric-Bernstam, Funda
    Mills, Gordon B.
    Chen, Ken
    [J]. PLOS ONE, 2018, 13 (05):