Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping

被引:33
|
作者
Hoellerer, Simon [1 ]
Papaxanthos, Laetitia [1 ,2 ]
Gumpinger, Anja Cathrin [1 ,2 ]
Fischer, Katrin [1 ]
Beisel, Christian [1 ]
Borgwardt, Karsten [1 ,2 ]
Benenson, Yaakov [1 ]
Jeschek, Markus [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, CH-4058 Basel, Switzerland
[2] Swiss Inst Bioinformat, CH-4058 Basel, Switzerland
关键词
RIBOSOME BINDING-SITES; GENE-REGULATORY LOGIC; TRANSLATION INITIATION; ESCHERICHIA-COLI; EXPRESSION; DESIGN; TRANSCRIPTION; PREDICTION;
D O I
10.1038/s41467-020-17222-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technically complex and error-prone. Here, we introduce DNA-based phenotypic recording as a widely applicable, practicable approach to generate large-scale sequence-function datasets. We use a site-specific recombinase to directly record a GRE's effect in DNA, enabling readout of both sequence and quantitative function for extremely large GRE-sets via next-generation sequencing. We record translation kinetics of over 300,000 bacterial ribosome binding sites (RBSs) in >2.7 million sequence-function pairs in a single experiment. Further, we introduce a deep learning approach employing ensembling and uncertainty modelling that predicts RBS function with high accuracy, outperforming state-of-the-art methods. DNA-based phenotypic recording combined with deep learning represents a major advance in our ability to predict function from genetic sequence. Current methods to generate sequence-function data at large scale are either technically complex or limited to specific applications. Here the authors introduce DNA-based phenotypic recording to overcome these limitations and enable deep learning for accurate prediction of function from sequence.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping
    Simon Höllerer
    Laetitia Papaxanthos
    Anja Cathrin Gumpinger
    Katrin Fischer
    Christian Beisel
    Karsten Borgwardt
    Yaakov Benenson
    Markus Jeschek
    Nature Communications, 11
  • [2] Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning
    Song, Hyebin
    Bremer, Bennett J.
    Hinds, Emily C.
    Raskutti, Garvesh
    Romero, Philip A.
    CELL SYSTEMS, 2021, 12 (01) : 92 - +
  • [3] Large-scale mapping of sequence-function relations in small regulatory RNAs reveals plasticity and modularity
    Peterman, Neil
    Lavi-Itzkovitz, Anat
    Levine, Erel
    NUCLEIC ACIDS RESEARCH, 2014, 42 (19) : 12177 - 12188
  • [4] Elucidating sequence-function relationships in a template-independent polymerase to enable novel DNA recording applications
    Milisavljevic, Marija
    Rodriguez, Teresa Rojas
    Tyo, Keith E. J.
    BIOTECHNOLOGY AND BIOENGINEERING, 2024, 121 (12) : 3808 - 3821
  • [5] Large-scale DNA-based typing of HLA-A and HLA-B at low resolution is highly accurate specific and reliable
    Hurley, CK
    Maiers, M
    Ng, J
    Wagage, D
    Hegland, J
    Baisch, J
    Endres, R
    Fernandez-Vina, M
    Heine, U
    Hsu, S
    Kamoun, M
    Mitsuishi, Y
    Monos, D
    Noreen, H
    Perlee, L
    Rodriguez-Marino, S
    Smith, A
    Stastny, P
    Trucco, M
    Yang, SY
    Yu, N
    Holsten, R
    Hartzman, RJ
    Setterholm, M
    TISSUE ANTIGENS, 2000, 55 (04): : 352 - 358
  • [6] Is a large-scale DNA-based inventory of ancient life possible?
    Lambert, DM
    Baker, A
    Huynen, L
    Haddrath, O
    Hebert, PDN
    Millar, CD
    JOURNAL OF HEREDITY, 2005, 96 (03) : 279 - 284
  • [7] Automated DNA-based plant identification for large-scale biodiversity assessment
    Papadopoulou, Anna
    Chesters, Douglas
    Coronado, Indiana
    De la Cadena, Gissela
    Cardoso, Anabela
    Reyes, Jazmina C.
    Maes, Jean-Michel
    Rueda, Ricardo M.
    Gomez-Zurita, Jesus
    MOLECULAR ECOLOGY RESOURCES, 2015, 15 (01) : 136 - 152
  • [8] Sort-seq under the hood: implications of design choices on large-scale characterization of sequence-function relations
    Peterman, Neil
    Levine, Erel
    BMC GENOMICS, 2016, 17
  • [9] Sort-seq under the hood: implications of design choices on large-scale characterization of sequence-function relations
    Neil Peterman
    Erel Levine
    BMC Genomics, 17
  • [10] A FAST AND PRECISE METHOD FOR LARGE-SCALE LAND-USE MAPPING BASED ON DEEP LEARNING
    Yang, Xuan
    Chen, Zhengchao
    Li, Baipeng
    Peng, Dailiang
    Chen, Pan
    Zhang, Bing
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 5913 - 5916