Omics data integration in computational biology viewed through the prism of machine learning paradigms

被引:3
|
作者
Fouche, Aziz [1 ,2 ,3 ,4 ]
Zinovyev, Andrei [5 ]
机构
[1] PSL Res Univ, Inst Curie, Paris, France
[2] INSERM, Paris, France
[3] PSL Res Univ, ParisTech, CBIO Ctr Computat Biol, Paris, France
[4] Ecole Normale Super Paris Saclay, Cachan, France
[5] Evotec, In Silico R&D, Toulouse, France
来源
关键词
single-cell; data integration; machine learning; batch effect; multi-omics; CELL TRANSCRIPTOMIC DATA; GENE-EXPRESSION; SEQ DATA; SINGLE;
D O I
10.3389/fbinf.2023.1191961
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Important quantities of biological data can today be acquired to characterize cell types and states, from various sources and using a wide diversity of methods, providing scientists with more and more information to answer challenging biological questions. Unfortunately, working with this amount of data comes at the price of ever-increasing data complexity. This is caused by the multiplication of data types and batch effects, which hinders the joint usage of all available data within common analyses. Data integration describes a set of tasks geared towards embedding several datasets of different origins or modalities into a joint representation that can then be used to carry out downstream analyses. In the last decade, dozens of methods have been proposed to tackle the different facets of the data integration problem, relying on various paradigms. This review introduces the most common data types encountered in computational biology and provides systematic definitions of the data integration problems. We then present how machine learning innovations were leveraged to build effective data integration algorithms, that are widely used today by computational biologists. We discuss the current state of data integration and important pitfalls to consider when working with data integration tools. We eventually detail a set of challenges the field will have to overcome in the coming years.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Integration of machine learning with computational structural biology of plants
    Shukla, Diwakar
    Chen, Jiming
    [J]. BIOCHEMICAL JOURNAL, 2022, 479 (08) : 921 - 928
  • [2] Computational Prediction of Host-Pathogen Interactions Through Omics Data Analysis and Machine Learning
    Leite, Diogo Manuel Carvalho
    Brochet, Xavier
    Resch, Gregory
    Que, Yok-Ai
    Neves, Aitana
    Pena-Reyes, Carlos
    [J]. BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2017, PT II, 2017, 10209 : 360 - 371
  • [3] Computational prediction of inter-species relationships through omics data analysis and machine learning
    Diogo Manuel Carvalho Leite
    Xavier Brochet
    Grégory Resch
    Yok-Ai Que
    Aitana Neves
    Carlos Peña-Reyes
    [J]. BMC Bioinformatics, 19
  • [4] Machine learning for multi-omics data integration in cancer
    Cai, Zhaoxiang
    Poulos, Rebecca C.
    Liu, Jia
    Zhong, Qing
    [J]. ISCIENCE, 2022, 25 (02)
  • [5] Computational prediction of inter-species relationships through omics data analysis and machine learning
    Leite, Diogo Manuel Carvalho
    Brochet, Xavier
    Resch, Gregory
    Que, Yok-Ai
    Neves, Aitana
    Pena-Reyes, Carlos
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [6] Integration strategies of multi-omics data for machine learning analysis
    Picard, Milan
    Scott-Boyer, Marie -Pier
    Bodein, Antoine
    Perin, Olivier
    Droit, Arnaud
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3735 - 3746
  • [7] Integration strategies of multi-omics data for machine learning analysis
    Picard, Milan
    Scott-Boyer, Marie-Pier
    Bodein, Antoine
    Périn, Olivier
    Droit, Arnaud
    [J]. Computational and Structural Biotechnology Journal, 2021, 19 : 3735 - 3746
  • [8] Computational systems biology for omics data analysis
    Chen, Luonan
    [J]. JOURNAL OF MOLECULAR CELL BIOLOGY, 2019, 11 (08) : 631 - 632
  • [9] Interpretable machine learning methods for predictions in systems biology from omics data
    Sidak, David
    Schwarzerova, Jana
    Weckwerth, Wolfram
    Waldherr, Steffen
    [J]. FRONTIERS IN MOLECULAR BIOSCIENCES, 2022, 9
  • [10] Machine Learning for multi-omics data integration and variant pathogenicity estimation
    Li, Shuang
    van der Velde, K. Joeri
    Swertz, Morris A.
    [J]. 2018 IEEE 14TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE 2018), 2018, : 301 - 301