Integration of multi-omics data for prediction of phenotypic traits using random forest

被引:64
|
作者
Acharjee, Animesh [1 ,3 ]
Kloosterman, Bjorn [1 ,2 ]
Visser, Richard G. F. [1 ]
Maliepaard, Chris [1 ]
机构
[1] Univ Wageningen & Res Ctr, Wageningen UR Plant Breeding, NL-6700 AJ Wageningen, Netherlands
[2] Keygene NV, POB 216, NL-6700 AE Wageningen, Netherlands
[3] MRC Human Nutr Res, 120 Fulbourn Rd, Cambridge CB1 9NL, England
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
Data integration; Genetical genomics; Networks; Random forest; GENETIC GENOMICS; POTATO; EXPRESSION; QTL; RNA;
D O I
10.1186/s12859-016-1043-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of potato. In this study we present an approach to integrate these similar to omics data sets for the purpose of predicting phenotypic traits. This gives us networks of relatively small sets of interrelated similar to omics variables that can predict, with higher accuracy, a quality trait of interest. Results: We used Random Forest regression for integrating multiple similar to omics data for prediction of four quality traits of potato: tuber flesh colour, DSC onset, tuber shape and enzymatic discoloration. For tuber flesh colour beta-carotene hydroxylase and zeaxanthin epoxidase were ranked first and forty-fourth respectively both of which have previously been associated with flesh colour in potato tubers. Combining all the significant genes, LC-peaks, GC-peaks and proteins, the variation explained was 75 %, only slightly more than what gene expression or LC-MS data explain by themselves which indicates that there are correlations among the variables across data sets. For tuber shape regressed on the gene expression, LC-MS, GC-MS and proteomics data sets separately, only gene expression data was found to explain significant variation. For DSC onset, we found 12 significant gene expression, 5 metabolite levels (GC) and 2 proteins that are associated with the trait. Using those 19 significant variables, the variation explained was 45 %. Expression QTL (eQTL) analyses showed many associations with genomic regions in chromosome 2 with also the highest explained variation compared to other chromosomes. Transcriptomics and metabolomics analysis on enzymatic discoloration after 5 min resulted in 420 significant genes and 8 significant LC metabolites, among which two were putatively identified as caffeoylquinic acid methyl ester and tyrosine. Conclusions: In this study, we made a strategy for selecting and integrating multiple similar to omics data using random forest method and selected representative individual peaks for networks based on eQTL, mQTL or pQTL information. Network analysis was done to interpret how a particular trait is associated with gene expression, metabolite and protein data.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Integration of multi-omics data for prediction of phenotypic traits using random forest
    Animesh Acharjee
    Bjorn Kloosterman
    Richard G. F. Visser
    Chris Maliepaard
    BMC Bioinformatics, 17
  • [2] Prediction of plant complex traits via integration of multi-omics data
    Wang, Peipei
    Lehti-Shiu, Melissa D.
    Lotreck, Serena
    Aba, Kenia Segura
    Krysan, Patrick J.
    Shiu, Shin-Han
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [3] Integration of multi-omics data for survival prediction of lung adenocarcinoma
    Guo, Dingjie
    Wang, Yixian
    Chen, Jing
    Liu, Xin
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 250
  • [4] Survey on Multi-omics, and Multi-omics Data Analysis, Integration and Application
    Shahrajabian, Mohamad Hesam
    Sun, Wenli
    CURRENT PHARMACEUTICAL ANALYSIS, 2023, 19 (04) : 267 - 281
  • [5] A latent unknown clustering integrating multi-omics data (LUCID) with phenotypic traits
    Peng, Cheng
    Wang, Jun
    Asante, Isaac
    Louie, Stan
    Jin, Ran
    Chatzi, Lida
    Casey, Graham
    Thomas, Duncan C.
    Conti, David, V
    BIOINFORMATICS, 2020, 36 (03) : 842 - 850
  • [6] Integration of multi-omics data accelerates molecular analysis of common wheat traits
    Ning Zhang
    Li Tang
    Songgang Li
    Lu Liu
    Mengjuan Gao
    Sisheng Wang
    Daiying Chen
    Yichao Zhao
    Ruiqing Zheng
    Armin Soleymaniniya
    Lingran Zhang
    Wenkang Wang
    Xia Yang
    Yan Ren
    Congwei Sun
    Mathias Wilhelm
    Daowen Wang
    Min Li
    Feng Chen
    Nature Communications, 16 (1)
  • [7] A roadmap for multi-omics data integration using deep learning
    Kang, Mingon
    Ko, Euiseong
    Mersha, Tesfaye B.
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [8] Utility of multi-omics data to inform genomic prediction of heifer fertility traits
    Tahir, Muhammad S.
    Porto-Neto, Laercio R.
    Reverter-Gomez, Toni
    Olasege, Babatunde S.
    Sajid, Mirza R.
    Wockner, Kimberley B.
    Tan, Andre W. L.
    Fortes, Marina R. S.
    JOURNAL OF ANIMAL SCIENCE, 2022, 100 (12)
  • [9] Towards multi-omics synthetic data integration
    Selvarajoo, Kumar
    Maurer-Stroh, Sebastian
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (03)
  • [10] A cloud solution for multi-omics data integration
    Tordini, Fabio
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 559 - 566