Evaluation of a two-stage framework for prediction using big genomic data

被引:3
|
作者
Jiang, Xia [1 ]
Neapolitan, Richard E. [2 ]
机构
[1] Univ Pittsburgh, Biomed Informat, Pittsburgh, PA 15260 USA
[2] Northwestern Univ, Biomed Informat, Evanston, IL 60208 USA
关键词
big data; high-dimensional data; prediction; Bayesian network; GWAS; SNP; WIDE ASSOCIATION; EPISTATIC INTERACTIONS; LOGISTIC-REGRESSION; ALZHEIMERS-DISEASE; GENE-GENE; RISK; STRATEGIES; ALGORITHM; INFERENCE; VARIANTS;
D O I
10.1093/bib/bbv010
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.
引用
收藏
页码:912 / 921
页数:10
相关论文
共 50 条
  • [1] A Two-Stage Framework for Big Spatial Data Analytics to Support Disaster Response
    Hu, Xuan
    Gong, Jie
    Renard, Eduard Gibert
    Parashar, Manish
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5409 - 5418
  • [2] A Two-stage Method of Synchronization Prediction Framework in TDD
    Chao-Hsien Hsieh
    Ziyi Wang
    [J]. Arabian Journal for Science and Engineering, 2022, 47 : 2345 - 2357
  • [3] Two-stage Hierarchical Framework for Solar Flare Prediction
    Deng, Hao
    Zhong, Yuting
    Chen, Hong
    Chen, Jun
    Wang, Jingjing
    Chen, Yanhong
    Luo, Bingxian
    [J]. ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2023, 268 (02):
  • [4] A Two-Stage Framework for Directed Hypergraph Link Prediction
    Xiao, Guanchen
    Liao, Jinzhi
    Tan, Zhen
    Zhang, Xiaonan
    Zhao, Xiang
    [J]. MATHEMATICS, 2022, 10 (14)
  • [5] A Two-stage Method of Synchronization Prediction Framework in TDD
    Hsieh, Chao-Hsien
    Wang, Ziyi
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (02) : 2345 - 2357
  • [6] A two-stage head pose estimation framework and evaluation
    Wu, Junwen
    Trivedi, Mohan M.
    [J]. PATTERN RECOGNITION, 2008, 41 (03) : 1138 - 1158
  • [7] A data-driven two-stage maintenance framework for degradation prediction in semiconductor manufacturing industries
    Luo, Ming
    Yan, Heng-Chao
    Hu, Bin
    Zhou, Jun-Hong
    Pang, Chee Khiang
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2015, 85 : 414 - 422
  • [8] Analyzing Viral Genomic Data Using Hadoop Framework in Big Data
    Nagpal, Disha
    Sood, Shriya
    Mohagaonkar, Sanika
    Sharma, Himanshu
    Saxena, Ankur
    [J]. PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 680 - 685
  • [9] A Two-Stage Classification Framework for Imbalanced Data with Overlapping Labels
    Zhou, Pei-Yuan
    Mo, Wenting
    Tian, Chunhua
    Li, Li
    Rui, Xiaoguang
    Wang, Haifeng
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SERVICE OPERATIONS AND LOGISTICS, AND INFORMATICS (SOLI), 2014, : 350 - 355
  • [10] Boosting Advanced Nasopharyngeal Carcinoma Stage Prediction Using a Two-Stage Classification Framework Based on Deep Learning
    Jin Huang
    Ruhan He
    Jia Chen
    Song Li
    Yuqin Deng
    Xinglong Wu
    [J]. International Journal of Computational Intelligence Systems, 14