A self-supervised deep learning method for data-efficient training in genomics

被引:0
|
作者
Hüseyin Anil Gündüz
Martin Binder
Xiao-Yin To
René Mreches
Bernd Bischl
Alice C. McHardy
Philipp C. Münch
Mina Rezaei
机构
[1] LMU Munich,Department of Statistics
[2] Munich Center for Machine Learning,Department for Computational Biology of Infection Research
[3] Helmholtz Center for Infection Research,Braunschweig Integrated Centre of Systems Biology (BRICS)
[4] Technische Universität Braunschweig,German Center for Infection Research (DZIF)
[5] partner site Hannover Braunschweig,Department of Biostatistics
[6] Harvard School of Public Health,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.
引用
收藏
相关论文
共 50 条
  • [1] A self-supervised deep learning method for data-efficient training in genomics
    Guenduez, Hueseyin Anil
    Binder, Martin
    To, Xiao-Yin
    Mreches, Rene
    Bischl, Bernd
    McHardy, Alice C.
    Muench, Philipp C.
    Rezaei, Mina
    COMMUNICATIONS BIOLOGY, 2023, 6 (01)
  • [2] A data-efficient self-supervised deep learning model for design and characterization of nanophotonic structures
    Ma, Wei
    Liu, Yongmin
    SCIENCE CHINA-PHYSICS MECHANICS & ASTRONOMY, 2020, 63 (08)
  • [3] A data-efficient self-supervised deep learning model for design and characterization of nanophotonic structures
    Wei Ma
    Yongmin Liu
    Science China Physics, Mechanics & Astronomy, 2020, 63
  • [4] A data-efficient self-supervised deep learning model for design and characterization of nanophotonic structures
    Wei Ma
    Yongmin Liu
    Science China(Physics,Mechanics & Astronomy), 2020, (08) : 27 - 34
  • [5] Self-Supervised Learning With Data-Efficient Supervised Fine-Tuning for Crowd Counting
    Wang, Rui
    Hao, Yixue
    Hu, Long
    Chen, Jincai
    Chen, Min
    Wu, Di
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1538 - 1546
  • [6] A Data-Efficient Training Method for Deep Reinforcement Learning
    Feng, Wenhui
    Han, Chongzhao
    Lian, Feng
    Liu, Xia
    ELECTRONICS, 2022, 11 (24)
  • [7] Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging
    Azizi, Shekoofeh
    Culp, Laura
    Freyberg, Jan
    Mustafa, Basil
    Baur, Sebastien
    Kornblith, Simon
    Chen, Ting
    Tomasev, Nenad
    Mitrovic, Jovana
    Strachan, Patricia
    Mahdavi, S. Sara
    Wulczyn, Ellery
    Babenko, Boris
    Walker, Megan
    Loh, Aaron
    Chen, Po-Hsuan Cameron
    Liu, Yuan
    Bavishi, Pinal
    McKinney, Scott Mayer
    Winkens, Jim
    Roy, Abhijit Guha
    Beaver, Zach
    Ryan, Fiona
    Krogue, Justin
    Etemadi, Mozziyar
    Telang, Umesh
    Liu, Yun
    Peng, Lily
    Corrado, Greg S.
    Webster, Dale R.
    Fleet, David
    Hinton, Geoffrey
    Houlsby, Neil
    Karthikesalingam, Alan
    Norouzi, Mohammad
    Natarajan, Vivek
    NATURE BIOMEDICAL ENGINEERING, 2023, 7 (06) : 756 - +
  • [8] Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging
    Shekoofeh Azizi
    Laura Culp
    Jan Freyberg
    Basil Mustafa
    Sebastien Baur
    Simon Kornblith
    Ting Chen
    Nenad Tomasev
    Jovana Mitrović
    Patricia Strachan
    S. Sara Mahdavi
    Ellery Wulczyn
    Boris Babenko
    Megan Walker
    Aaron Loh
    Po-Hsuan Cameron Chen
    Yuan Liu
    Pinal Bavishi
    Scott Mayer McKinney
    Jim Winkens
    Abhijit Guha Roy
    Zach Beaver
    Fiona Ryan
    Justin Krogue
    Mozziyar Etemadi
    Umesh Telang
    Yun Liu
    Lily Peng
    Greg S. Corrado
    Dale R. Webster
    David Fleet
    Geoffrey Hinton
    Neil Houlsby
    Alan Karthikesalingam
    Mohammad Norouzi
    Vivek Natarajan
    Nature Biomedical Engineering, 2023, 7 (6) : 756 - 779
  • [9] Data-Efficient Masked Video Modeling for Self-supervised Action Recognition
    Li, Qiankun
    Huang, Xiaolong
    Wan, Zhifan
    Hu, Lanqing
    Wu, Shuzhe
    Zhang, Jie
    Shan, Shiguang
    Wang, Zengfu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2723 - 2733
  • [10] Primitive-contrastive network: data-efficient self-supervised learning from robot demonstration videos
    Pengfei Sun
    Zhile Yang
    Tianren Zhang
    Shangqi Guo
    Feng Chen
    Applied Intelligence, 2022, 52 : 4258 - 4273