Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities

被引:325
|
作者
Zitnik, Marinka [1 ]
Nguyen, Francis [2 ,3 ]
Wang, Bo [4 ]
Leskovec, Jure [1 ,5 ]
Goldenberg, Anna [6 ,7 ,8 ]
Hoffman, Michael M. [2 ,3 ,7 ,8 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Univ Toronto, Dept Med Biophys, Toronto, ON, Canada
[3] Princess Margaret Canc Ctr, Toronto, ON, Canada
[4] Hikvis Res Inst, Santa Clara, CA USA
[5] Chan Zuckerberg Biohub, San Francisco, CA 94158 USA
[6] SickKids Res Inst, Genet & Genome Biol, Toronto, ON, Canada
[7] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[8] Vector Inst, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Computational biology; Personalized medicine; Systems biology; Heterogeneous data; Machine learning; DRUG-DRUG INTERACTION; GENOME-WIDE ASSOCIATION; DNA METHYLATION; DATA FUSION; TRANSCRIPTION FACTORS; CHROMATIN-STATE; CHIP-SEQ; PROBABILISTIC FUNCTIONS; MULTICELLULAR FUNCTION; HETEROGENEOUS NETWORK;
D O I
10.1016/j.inffus.2018.09.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
引用
收藏
页码:71 / 91
页数:21
相关论文
共 50 条
  • [21] Machine learning on big data: Opportunities and challenges
    Zhou, Lina
    Pan, Shimei
    Wang, Jianwu
    Vasilakos, Athanasios V.
    NEUROCOMPUTING, 2017, 237 : 350 - 361
  • [22] Data science, artificial intelligence, and machine learning: Opportunities for laboratory medicine and the value of positive regulation
    Gruson, Damien
    Helleputte, Thibault
    Rousseau, Patrick
    Gruson, David
    CLINICAL BIOCHEMISTRY, 2019, 69 : 1 - 7
  • [23] Integrating Adult Learning Principles Into Training for Public Health Practice
    Bryan, Rebecca L.
    Kreuter, Matthew W.
    Brownson, Ross C.
    HEALTH PROMOTION PRACTICE, 2009, 10 (04) : 557 - 563
  • [24] Bayesian inference: An introduction to principles and practice in machine learning
    Tipping, ME
    ADVANCED LECTURES ON MACHINE LEARNING, 2004, 3176 : 41 - 62
  • [25] Principles and Theory for Data Mining and Machine Learning
    Kleine, Liliana Lopez
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2010, 173 : 691 - 692
  • [26] Integrating Data Selection and Extreme Learning Machine for Imbalanced Data
    Mahdiyah, Umi
    Irawan, M. Isa
    Imah, Elly Matul
    INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2015), 2015, 59 : 221 - 229
  • [27] Integrating Mathematics Principles in Cell Biology
    Lin, Danielle
    Mimbs, Debra
    West, Lori
    FASEB JOURNAL, 2015, 29
  • [28] Neural network programming: Integrating first principles into machine learning models
    Carranza-Abaid, Andres
    Jakobsen, Jana P.
    COMPUTERS & CHEMICAL ENGINEERING, 2022, 163
  • [29] Synthetic data in machine learning for medicine and healthcare
    Richard J. Chen
    Ming Y. Lu
    Tiffany Y. Chen
    Drew F. K. Williamson
    Faisal Mahmood
    Nature Biomedical Engineering, 2021, 5 : 493 - 497
  • [30] Synthetic data in machine learning for medicine and healthcare
    Chen, Richard J.
    Lu, Ming Y.
    Chen, Tiffany Y.
    Williamson, Drew F. K.
    Mahmood, Faisal
    NATURE BIOMEDICAL ENGINEERING, 2021, 5 (06) : 493 - 497