Cross-validation and cross-study validation of chronic lymphocytic leukemia with exome sequences and machine learning

被引:0
|
作者
Patel, Nihir [1 ]
Ihadav, Bharati [1 ]
Aljouie, Abdulrhman [2 ]
Roshan, Usman [2 ]
机构
[1] Mt Sinai Hosp, Hess Ctr Sci & Med, Icahn Sch Med, Dept Genet & Genom Sci, New York, NY 10029 USA
[2] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
关键词
GENOME-WIDE ASSOCIATION; TRANSFER-RNA SYNTHETASES; INDIVIDUAL GENETIC RISK; PREDICTION; DISEASE; SUSCEPTIBILITY; POLYMORPHISMS; MUTATIONS; FRAMEWORK; ANCESTRY;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The era of genomics brings the potential of better DNA based risk prediction and treatment. While genome-wide association studies are extensively studied for risk prediction, the potential of using whole exome data for this purpose is unclear. We explore this problem for chronic lymphocytic leukemia that is one of the largest whole exome dataset of 186 case and 169 controls available from the NIH dbGaP database. We perform a standard next generation sequence procedure to obtain SNP variants on 153 cases and 144 controls after exclusion of samples with missing data. To evaluate their predictive power we first conduct a 50% training and 50% test cross-validation study on the full dataset with the support vector machine as the classifier. There we obtain a mean accuracy of 82 % with top 20 ranked SNPs obtained by the Pearson correlation coefficient. We then perform a cross-study validation on case and controls from a lymphoma external study and just controls from head and neck cancer and breast cancer studies (all obtained from NIH dbGaP). On the external dataset we obtain an accuracy of 70% with top ranked SNPs obtained from the original dataset. We also find our top Pearson ranked SNPs to lie on previously implicated genes for this disease. Our study shows that even with a small sample size we can obtain moderate to high accuracy with exome sequences and is thus encouraging for future work.
引用
收藏
页码:1367 / 1374
页数:8
相关论文
共 50 条
  • [1] Cross-validation and cross-study validation of chronic lymphocytic leukaemia with exome sequences and machine learning
    Aljouie, Abdulrhman
    Patel, Nihir
    Jadhav, Bharati
    Roshan, Usman
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 16 (01) : 47 - 63
  • [2] Cross-validation and cross-study validation of kidney cancer with machine learning and whole exome sequences from the National Cancer Institute
    Aljouie, Abdulrhman
    Patel, Nihir
    Roshan, Usman
    2018 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2018, : 61 - 66
  • [3] Machine-learning prediction of adolescent alcohol use: a cross-study, cross-cultural validation
    Afzali, Mohammad H.
    Sunderland, Matthew
    Stewart, Sherry
    Masse, Benoit
    Seguin, Jean
    Newton, Nicola
    Teesson, Maree
    Conrod, Patricia
    ADDICTION, 2019, 114 (04) : 662 - 671
  • [4] A cross-validation scheme for machine learning algorithms in shotgun proteomics
    Viktor Granholm
    William Stafford Noble
    Lukas Käll
    BMC Bioinformatics, 13
  • [5] A cross-validation scheme for machine learning algorithms in shotgun proteomics
    Granholm, Viktor
    Noble, William Stafford
    Kall, Lukas
    BMC BIOINFORMATICS, 2012, 13
  • [6] A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning
    Szeghalmy, Szilvia
    Fazekas, Attila
    SENSORS, 2023, 23 (04)
  • [7] METRIC LEARNING VIA CROSS-VALIDATION
    Dai, Linlin
    Chen, Kani
    Li, Gang
    Lin, Yuanyuan
    STATISTICA SINICA, 2022, 32 (03) : 1701 - 1721
  • [8] Fast Cross-Validation for Incremental Learning
    Joulani, Pooria
    Gyorgy, Andras
    Szepesvari, Csaba
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3597 - 3604
  • [9] Cross-study validation for the assessment of prediction algorithms
    Bernau, Christoph
    Riester, Markus
    Boulesteix, Anne-Laure
    Parmigiani, Giovanni
    Huttenhower, Curtis
    Waldron, Levi
    Trippa, Lorenzo
    BIOINFORMATICS, 2014, 30 (12) : 105 - 112