Accurate annotation of protein-coding genes in mitochondrial genomes

被引:22
|
作者
Al Arab, Marwa [1 ,2 ,8 ]
zu Siederdissen, Christian Hoener [1 ,2 ,3 ]
Tout, Kifah [8 ]
Sahyoun, Abdullah H. [1 ,2 ,8 ,9 ]
Stadler, Peter F. [1 ,2 ,3 ,4 ,5 ,6 ,7 ]
Bernt, Matthias [1 ,10 ]
机构
[1] Univ Leipzig, Dept Comp Sci, Bioinformat Grp, Hartelstr 16-18, D-04107 Leipzig, Germany
[2] Univ Leipzig, Interdisciplinary Ctr Bioinformat, Hartelstr 16-18, D-04107 Leipzig, Germany
[3] Univ Vienna, Inst Theoret Chem, Wahringerstr 17, A-1090 Vienna, Austria
[4] Max Planck Inst Math Sci, Inselstr 22, D-04103 Leipzig, Germany
[5] Fraunhofer Inst Zelltherapie & Immunol, Perlickstr 1, D-04103 Leipzig, Germany
[6] Univ Copenhagen, Ctr Noncoding RNA Technol & Hlth, Gronnegardsvej 3, DK-1870 Frederiksberg C, Denmark
[7] Santa Fe Inst, 1399 Hyde Pk Rd, Santa Fe, NM 87501 USA
[8] Lebanese Univ, Doctoral Sch Sci & Technol, AZM Ctr Biotechnol Res, Tripoli, Lebanon
[9] Johannes Gutenberg Univ Mainz gGmbH, Univ Med Ctr, TRON Translat Oncol, Mainz, Germany
[10] Univ Leipzig, Parallel Comp & Complex Syst Grp, Dept Comp Sci, Augustuspl 10, D-04103 Leipzig, Germany
关键词
Protein coding genes; Metazoa; Mitochondrial DNA; Annotation; Hidden Markov models; AUTOMATIC ANNOTATION; SEQUENCE; PHYLOGENY; DNA; TRANSCRIPTS; ALIGNMENTS; DATABASE; TURTLES; BIRDS; CODE;
D O I
10.1016/j.ympev.2016.09.024
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Mitochondrial genome sequences are available in large number and new sequences become published nowadays with increasing pace. Fast, automatic, consistent, and high quality annotations are a prerequisite for downstream analyses. Therefore, we present an automated pipeline for fast de novo annotation of mitochondrial protein-coding genes. The annotation is based on enhanced phylogeny-aware hidden Markov models (HMMs). The pipeline builds taxon-specific enhanced multiple sequence alignments (MSA) of already annotated sequences and corresponding HMMs using an approximation of the phylogeny. The MSAs are enhanced by fixing unannotated frameshifts, purging of wrong sequences, and removal of non-conserved columns from both ends. A comparison with reference annotations highlights the high quality of the results. The frameshift correction method predicts a large number of frameshifts, many of which are unknown. A detailed analysis of the frameshifts in nad3 of the Archosauria-Testudines group has been conducted. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:209 / 216
页数:8
相关论文
共 50 条
  • [1] Improved annotation of protein-coding genes boundaries in metazoan mitochondrial genomes
    Donath, Alexander
    Juehling, Frank
    Al-Arab, Marwa
    Bernhart, Stephan H.
    Reinhardt, Franziska
    Stadler, Peter F.
    Middendorf, Martin
    Bernt, Matthias
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (20) : 10543 - 10552
  • [2] ANNOTATION OF PROTEIN-CODING GENES IN FUNGAL GENOMES
    Martinez, Diego
    Grigoriev, Igor
    Salamov, Asaf
    [J]. APPLIED AND COMPUTATIONAL MATHEMATICS, 2010, 9 : 56 - 65
  • [3] Current methods for automated annotation of protein-coding genes
    Hoff, K. J.
    Stanke, M.
    [J]. CURRENT OPINION IN INSECT SCIENCE, 2015, 7 : 8 - 14
  • [4] Ablating all mitochondrial protein-coding genes
    Xin Lou
    Bin Shen
    [J]. Nature Biomedical Engineering, 2023, 7 : 609 - 611
  • [5] Ablating all mitochondrial protein-coding genes
    Lou, Xin
    Shen, Bin
    [J]. NATURE BIOMEDICAL ENGINEERING, 2023, 7 (05) : 609 - 611
  • [6] An Updated Functional Annotation of Protein-Coding Genes in the Cucumber Genome
    Song, Hongtao
    Lin, Kui
    Hu, Jinglu
    Pang, Erli
    [J]. FRONTIERS IN PLANT SCIENCE, 2018, 9
  • [7] Genome-wide annotation of protein-coding genes in pig
    Max Karlsson
    Evelina Sjöstedt
    Per Oksvold
    Åsa Sivertsson
    Jinrong Huang
    María Bueno Álvez
    Muhammad Arif
    Xiangyu Li
    Lin Lin
    Jiaying Yu
    Tao Ma
    Fengping Xu
    Peng Han
    Hui Jiang
    Adil Mardinoglu
    Cheng Zhang
    Kalle von Feilitzen
    Xun Xu
    Jian Wang
    Huanming Yang
    Lars Bolund
    Wen Zhong
    Linn Fagerberg
    Cecilia Lindskog
    Fredrik Pontén
    Jan Mulder
    Yonglun Luo
    Mathias Uhlen
    [J]. BMC Biology, 20
  • [8] Genome-wide annotation of protein-coding genes in pig
    Karlsson, Max
    Sjostedt, Evelina
    Oksvold, Per
    Sivertsson, Asa
    Huang, Jinrong
    Alvez, Maria Bueno
    Arif, Muhammad
    Li, Xiangyu
    Lin, Lin
    Yu, Jiaying
    Ma, Tao
    Xu, Fengping
    Han, Peng
    Jiang, Hui
    Mardinoglu, Adil
    Zhang, Cheng
    von Feilitzen, Kalle
    Xu, Xun
    Wang, Jian
    Yang, Huanming
    Bolund, Lars
    Zhong, Wen
    Fagerberg, Linn
    Lindskog, Cecilia
    Ponten, Fredrik
    Mulder, Jan
    Luo, Yonglun
    Uhlen, Mathias
    [J]. BMC BIOLOGY, 2022, 20 (01)
  • [9] Accurate annotation of human protein-coding small open reading frames
    Thomas F. Martinez
    Qian Chu
    Cynthia Donaldson
    Dan Tan
    Maxim N. Shokhirev
    Alan Saghatelian
    [J]. Nature Chemical Biology, 2020, 16 : 458 - 468
  • [10] Accurate annotation of human protein-coding small open reading frames
    Martinez, Thomas F.
    Chu, Qian
    Donaldson, Cynthia
    Tan, Dan
    Shokhirev, Maxim N.
    Saghatelian, Alan
    [J]. NATURE CHEMICAL BIOLOGY, 2020, 16 (04) : 458 - +