Automated Prediction and Annotation of Small Open Reading Frames in Microbial Genomes

被引:24
|
作者
Durrant, Matthew G. [1 ,2 ]
Bhatt, Ami S. [1 ,2 ]
机构
[1] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Med Hematol Blood & Marrow Transplantat, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
RNA; ALIGNMENT; BACTERIAL; PROTEINS; HIDDEN; SUITE;
D O I
10.1016/j.chom.2020.11.002
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Small open reading frames (smORFs) and their encoded microproteins play central roles in microbes. However, there is a vast unexplored space of smORFs within human-associated microbes. A recent bioinformatic analysis used evolutionary conservation signals to enhance prediction of small protein families. To facilitate the annotation of specific smORFs, we introduce SmORFinder. This tool combines profile hidden Markov models of each smORF family and deep learning models that better generalize to smORF families not seen in the training set, resulting in predictions enriched for Ribo-seq translation signals. Feature importance analysis reveals that the deep learning models learn to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codon synonyms found in the codon table. A core-genome analysis of 26 bacterial species identifies several core smORFs of unknown function. We pre-compute smORF annotations for thousands of RefSeq isolate genomes and Human Microbiome Project metagenomes and provide these data through a public web portal.
引用
收藏
页码:121 / +
页数:15
相关论文
共 50 条
  • [21] HOW TO DEAL WITH SMALL OPEN READING FRAMES?
    Wanczyk, Malgorzata
    Blazej, Pawel
    Mackiewicz, Pawel
    Cebrat, Stanislaw
    BIOINFORMATICS: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS, 2012, : 246 - 250
  • [22] Small open reading frames and cellular stress responses
    Khitun, Alexandra
    Ness, Travis J.
    Slavoff, Sarah A.
    MOLECULAR OMICS, 2019, 15 (02) : 108 - 116
  • [23] Small open reading frames: Beautiful needles in the haystack
    Basrai, MA
    Hieter, P
    Boeke, JD
    GENOME RESEARCH, 1997, 7 (08): : 768 - 771
  • [24] OpenVar: functional annotation of variants in non-canonical open reading frames
    Brunet, Marie A.
    Leblanc, Sebastien
    Roucou, Xavier
    CELL AND BIOSCIENCE, 2022, 12 (01):
  • [25] OpenVar: functional annotation of variants in non-canonical open reading frames
    Marie A. Brunet
    Sébastien Leblanc
    Xavier Roucou
    Cell & Bioscience, 12
  • [26] THE PREDICTION OF EXONS THROUGH AN ANALYSIS OF SPLICEABLE OPEN READING FRAMES
    HUTCHINSON, GB
    HAYDEN, MR
    NUCLEIC ACIDS RESEARCH, 1992, 20 (13) : 3453 - 3462
  • [27] Automated identification of putative methyltransferases from genomic open reading frames
    Katz, JE
    Dlakic, M
    Clarke, S
    MOLECULAR & CELLULAR PROTEOMICS, 2003, 2 (08) : 525 - 540
  • [28] Filtering "genic" open reading frames from genomic DNA samples for advanced annotation
    Sara D'Angelo
    Nileena Velappan
    Flavio Mignone
    Claudio Santoro
    Daniele Sblattero
    Csaba Kiss
    Andrew RM Bradbury
    BMC Genomics, 12
  • [29] uORF4u: a tool for annotation of conserved upstream open reading frames
    Egorov, Artyom A.
    Atkinson, Gemma C.
    BIOINFORMATICS, 2023, 39 (05)
  • [30] Small open reading frames: a comparative genetics approach to validation
    Niyati Jain
    Felix Richter
    Ivan Adzhubei
    Andrew J. Sharp
    Bruce D. Gelb
    BMC Genomics, 24