Automated Prediction and Annotation of Small Open Reading Frames in Microbial Genomes

被引:24
|
作者
Durrant, Matthew G. [1 ,2 ]
Bhatt, Ami S. [1 ,2 ]
机构
[1] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Med Hematol Blood & Marrow Transplantat, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
RNA; ALIGNMENT; BACTERIAL; PROTEINS; HIDDEN; SUITE;
D O I
10.1016/j.chom.2020.11.002
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Small open reading frames (smORFs) and their encoded microproteins play central roles in microbes. However, there is a vast unexplored space of smORFs within human-associated microbes. A recent bioinformatic analysis used evolutionary conservation signals to enhance prediction of small protein families. To facilitate the annotation of specific smORFs, we introduce SmORFinder. This tool combines profile hidden Markov models of each smORF family and deep learning models that better generalize to smORF families not seen in the training set, resulting in predictions enriched for Ribo-seq translation signals. Feature importance analysis reveals that the deep learning models learn to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codon synonyms found in the codon table. A core-genome analysis of 26 bacterial species identifies several core smORFs of unknown function. We pre-compute smORF annotations for thousands of RefSeq isolate genomes and Human Microbiome Project metagenomes and provide these data through a public web portal.
引用
收藏
页码:121 / +
页数:15
相关论文
共 50 条
  • [1] Computational discovery and annotation of conserved small open reading frames in fungal genomes
    Shuhaila Mat-Sharani
    Mohd Firdaus-Raih
    BMC Bioinformatics, 19
  • [2] Computational discovery and annotation of conserved small open reading frames in fungal genomes
    Mat-Sharani, Shuhaila
    Firdaus-Raih, Mohd
    BMC BIOINFORMATICS, 2019, 19 (Suppl 13)
  • [3] Small open reading frames associated with morphogenesis are hidden in plant genomes
    Hanada, Kousuke
    Higuchi-Takeuchi, Mieko
    Okamoto, Masanori
    Yoshizumi, Takeshi
    Shimizu, Minami
    Nakaminami, Kentaro
    Nishi, Ranko
    Ohashi, Chihiro
    Iida, Kei
    Tanaka, Maho
    Horii, Yoko
    Kawashima, Mika
    Matsui, Keiko
    Toyoda, Tetsuro
    Shinozaki, Kazuo
    Seki, Motoaki
    Matsui, Minami
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (06) : 2395 - 2400
  • [4] Standardized annotation of translated open reading frames
    Mudge, Jonathan M.
    Ruiz-Orera, Jorge
    Prensner, John R.
    Brunet, Marie A.
    Calvet, Ferriol
    Jungreis, Irwin
    Gonzalez, Jose Manuel
    Magrane, Michele
    Martinez, Thomas F.
    Schulz, Jana Felicitas
    Yang, Yucheng T.
    Alba, M. Mar
    Aspden, Julie L.
    Baranov, Pavel V.
    Bazzini, Ariel A.
    Bruford, Elspeth
    Martin, Maria Jesus
    Calviello, Lorenzo
    Carvunis, Anne-Ruxandra
    Chen, Jin
    Couso, Juan Pablo
    Deutsch, Eric W.
    Flicek, Paul
    Frankish, Adam
    Gerstein, Mark
    Hubner, Norbert
    Ingolia, Nicholas T.
    Kellis, Manolis
    Menschaert, Gerben
    Moritz, Robert L.
    Ohler, Uwe
    Roucou, Xavier
    Saghatelian, Alan
    Weissman, Jonathan S.
    van Heesch, Sebastiaan
    NATURE BIOTECHNOLOGY, 2022, 40 (07) : 994 - 999
  • [5] Standardized annotation of translated open reading frames
    Jonathan M. Mudge
    Jorge Ruiz-Orera
    John R. Prensner
    Marie A. Brunet
    Ferriol Calvet
    Irwin Jungreis
    Jose Manuel Gonzalez
    Michele Magrane
    Thomas F. Martinez
    Jana Felicitas Schulz
    Yucheng T. Yang
    M. Mar Albà
    Julie L. Aspden
    Pavel V. Baranov
    Ariel A. Bazzini
    Elspeth Bruford
    Maria Jesus Martin
    Lorenzo Calviello
    Anne-Ruxandra Carvunis
    Jin Chen
    Juan Pablo Couso
    Eric W. Deutsch
    Paul Flicek
    Adam Frankish
    Mark Gerstein
    Norbert Hubner
    Nicholas T. Ingolia
    Manolis Kellis
    Gerben Menschaert
    Robert L. Moritz
    Uwe Ohler
    Xavier Roucou
    Alan Saghatelian
    Jonathan S. Weissman
    Sebastiaan van Heesch
    Nature Biotechnology, 2022, 40 : 994 - 999
  • [6] Accurate annotation of human protein-coding small open reading frames
    Thomas F. Martinez
    Qian Chu
    Cynthia Donaldson
    Dan Tan
    Maxim N. Shokhirev
    Alan Saghatelian
    Nature Chemical Biology, 2020, 16 : 458 - 468
  • [7] Accurate annotation of human protein-coding small open reading frames
    Martinez, Thomas F.
    Chu, Qian
    Donaldson, Cynthia
    Tan, Dan
    Shokhirev, Maxim N.
    Saghatelian, Alan
    NATURE CHEMICAL BIOLOGY, 2020, 16 (04) : 458 - +
  • [8] Accurate Annotation of Protein-coding Small Open Reading Frames in the Human Genome
    Martinez, Thomas
    Chu, Qian
    Donaldson, Cynthia
    Tan, Dan
    Shokhirev, Maxim
    Saghatelian, Alan
    FASEB JOURNAL, 2020, 34
  • [9] Sequence and Function Analysis of Peptide Coding Small Open Reading Frames in Prokaryotic Genomes
    Chen Yi-Ting
    Zhang Feng
    Zhao Jia
    Yu Jia-Feng
    Sha Yu-Jie
    Wang Ji-Hua
    PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2018, 45 (01) : 59 - 67
  • [10] IDENTIFICATION OF FUNCTIONAL OPEN READING FRAMES IN CHLOROPLAST GENOMES
    WOLFE, KH
    SHARP, PM
    GENE, 1988, 66 (02) : 215 - 222