Integration of multi-omics data for prediction of phenotypic traits using random forest

被引:64
|
作者
Acharjee, Animesh [1 ,3 ]
Kloosterman, Bjorn [1 ,2 ]
Visser, Richard G. F. [1 ]
Maliepaard, Chris [1 ]
机构
[1] Univ Wageningen & Res Ctr, Wageningen UR Plant Breeding, NL-6700 AJ Wageningen, Netherlands
[2] Keygene NV, POB 216, NL-6700 AE Wageningen, Netherlands
[3] MRC Human Nutr Res, 120 Fulbourn Rd, Cambridge CB1 9NL, England
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
Data integration; Genetical genomics; Networks; Random forest; GENETIC GENOMICS; POTATO; EXPRESSION; QTL; RNA;
D O I
10.1186/s12859-016-1043-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of potato. In this study we present an approach to integrate these similar to omics data sets for the purpose of predicting phenotypic traits. This gives us networks of relatively small sets of interrelated similar to omics variables that can predict, with higher accuracy, a quality trait of interest. Results: We used Random Forest regression for integrating multiple similar to omics data for prediction of four quality traits of potato: tuber flesh colour, DSC onset, tuber shape and enzymatic discoloration. For tuber flesh colour beta-carotene hydroxylase and zeaxanthin epoxidase were ranked first and forty-fourth respectively both of which have previously been associated with flesh colour in potato tubers. Combining all the significant genes, LC-peaks, GC-peaks and proteins, the variation explained was 75 %, only slightly more than what gene expression or LC-MS data explain by themselves which indicates that there are correlations among the variables across data sets. For tuber shape regressed on the gene expression, LC-MS, GC-MS and proteomics data sets separately, only gene expression data was found to explain significant variation. For DSC onset, we found 12 significant gene expression, 5 metabolite levels (GC) and 2 proteins that are associated with the trait. Using those 19 significant variables, the variation explained was 45 %. Expression QTL (eQTL) analyses showed many associations with genomic regions in chromosome 2 with also the highest explained variation compared to other chromosomes. Transcriptomics and metabolomics analysis on enzymatic discoloration after 5 min resulted in 420 significant genes and 8 significant LC metabolites, among which two were putatively identified as caffeoylquinic acid methyl ester and tyrosine. Conclusions: In this study, we made a strategy for selecting and integrating multiple similar to omics data using random forest method and selected representative individual peaks for networks based on eQTL, mQTL or pQTL information. Network analysis was done to interpret how a particular trait is associated with gene expression, metabolite and protein data.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Alzheimer's disease prediction based on continuous feature representation using multi-omics data integration
    Abbas, Zeeshan
    Tayara, Hilal
    Chong, Kil To
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 223
  • [22] Multi-Omics Integration Analysis for the Osteoporosis Prediction through Deep Flexible Neural Forest Framework
    Gong, Yun
    Jiang, Lindong
    Liu, Anqi
    Zhang, Xiao
    Shen, Hui
    Deng, Hong-Wen
    JOURNAL OF BONE AND MINERAL RESEARCH, 2023, 38 : 233 - 233
  • [23] Multi-omics Data Integration, Interpretation, and Its Application
    Subramanian, Indhupriya
    Verma, Srikant
    Kumar, Shiva
    Jere, Abhay
    Anamika, Krishanpal
    BIOINFORMATICS AND BIOLOGY INSIGHTS, 2020, 14
  • [24] Optimizing network propagation for multi-omics data integration
    Charmpi, Konstantina
    Chokkalingam, Manopriya
    Johnen, Ronja
    Beyer, Andreas
    PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (11)
  • [25] ‘Multi-omics’ data integration: applications in probiotics studies
    Iliya Dauda Kwoji
    Olayinka Ayobami Aiyegoro
    Moses Okpeku
    Matthew Adekunle Adeleke
    npj Science of Food, 7
  • [26] Methods for the integration of multi-omics data: mathematical aspects
    Bersanelli, Matteo
    Mosca, Ettore
    Remondini, Daniel
    Giampieri, Enrico
    Sala, Claudia
    Castellani, Gastone
    Milanesi, Luciano
    BMC BIOINFORMATICS, 2016, 17
  • [27] Multi-omics data integration by generative adversarial network
    Ahmed, Khandakar Tanvir
    Sun, Jiao
    Cheng, Sze
    Yong, Jeongsik
    Zhang, Wei
    BIOINFORMATICS, 2022, 38 (01) : 179 - 186
  • [28] A survey on data integration for multi-omics sample clustering
    Lovino, Marta
    Randazzo, Vincenzo
    Ciravegna, Gabriele
    Barbiero, Pietro
    Ficarra, Elisa
    Cirrincione, Giansalvo
    NEUROCOMPUTING, 2022, 488 : 494 - 508
  • [29] Prospects and challenges of multi-omics data integration in toxicology
    Sebastian Canzler
    Jana Schor
    Wibke Busch
    Kristin Schubert
    Ulrike E. Rolle-Kampczyk
    Hervé Seitz
    Hennicke Kamp
    Martin von Bergen
    Roland Buesen
    Jörg Hackermüller
    Archives of Toxicology, 2020, 94 : 371 - 388
  • [30] Methods for the integration of multi-omics data: mathematical aspects
    Matteo Bersanelli
    Ettore Mosca
    Daniel Remondini
    Enrico Giampieri
    Claudia Sala
    Gastone Castellani
    Luciano Milanesi
    BMC Bioinformatics, 17