The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes

被引:15
|
作者
Estill, James C. [1 ]
Bennetzen, Jeffrey L. [2 ]
机构
[1] Univ Georgia, Dept Plant Biol, Athens, GA 30602 USA
[2] Univ Georgia, Dept Genet, Athens, GA 30602 USA
关键词
DE-NOVO IDENTIFICATION; DATABASE; PREDICTION; ALIGNMENT; SEQUENCE; PROGRAM; FAMILIES; VISUALIZATION; RESOURCE; BROWSER;
D O I
10.1186/1746-4811-5-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: High quality annotation of the genes and transposable elements in complex genomes requires a human-curated integration of multiple sources of computational evidence. These evidences include results from a diversity of ab initio prediction programs as well as homology-based searches. Most of these programs operate on a single contiguous sequence at a time, and the results are generated in a diverse array of readable formats that must be translated to a standardized file format. These translated results must then be concatenated into a single source, and then presented in an integrated form for human curation. Results: We have designed, implemented, and assessed a Perl-based workflow named DAWGPAWS for the generation of computational results for human curation of the genes and transposable elements in plant genomes. The use of DAWGPAWS was found to accelerate annotation of 80-200 kb wheat DNA inserts in bacterial artificial chromosome (BAC) vectors by approximately twenty-fold and to also significantly improve the quality of the annotation in terms of completeness and accuracy. Conclusion: The DAWGPAWS genome annotation pipeline fills an important need in the annotation of plant genomes by generating computational evidences in a high throughput manner, translating these results to a common file format, and facilitating the human curation of these computational results. We have verified the value of DAWGPAWS by using this pipeline to annotate the genes and transposable elements in 220 BAC insertions from the hexaploid wheat genome (Triticum aestivum L.). DAWGPAWS can be applied to annotation efforts in other plant genomes with minor modifications of program-specific configuration files, and the modular design of the workflow facilitates integration into existing pipelines.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Evolution and Diversity of Transposable Elements in Vertebrate Genomes
    Sotero-Caio, Cibele G.
    Platt, Roy N., II
    Suh, Alexander
    Ray, David A.
    GENOME BIOLOGY AND EVOLUTION, 2017, 9 (01): : 161 - 177
  • [32] Impact of transposable elements on insect genomes and biology
    Maumus, Florian
    Fiston-Lavier, Anna-Sophie
    Quesneville, Hadi
    CURRENT OPINION IN INSECT SCIENCE, 2015, 7 : 30 - 36
  • [33] Evolution and diversity of transposable elements in fish genomes
    Feng Shao
    Minjin Han
    Zuogang Peng
    Scientific Reports, 9
  • [34] Regulation and function of transposable elements in cancer genomes
    Michael Lee
    Syed Farhan Ahmad
    Jian Xu
    Cellular and Molecular Life Sciences, 2024, 81
  • [35] Transposable elements domesticated and neofunctionalized by eukaryotic genomes
    Alzohairy, Ahmed M.
    Gyulai, Gabor
    Jansen, Robert K.
    Bahieldin, Ahmed
    PLASMID, 2013, 69 (01) : 1 - 15
  • [36] Evolution and diversity of transposable elements in fish genomes
    Shao, Feng
    Han, Minjin
    Peng, Zuogang
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [37] Regulation and function of transposable elements in cancer genomes
    Lee, Michael
    Ahmad, Syed Farhan
    Xu, Jian
    CELLULAR AND MOLECULAR LIFE SCIENCES, 2024, 81 (01) : 157
  • [38] Epigenetic Regulation of Mammalian Genomes by Transposable Elements
    Huda, Ahsan
    Jordan, I. King
    NATURAL GENETIC ENGINEERING AND NATURAL GENOME EDITING, 2009, 1178 : 276 - 284
  • [39] The genomes and transposable elements in plants: are they friends or foes?
    Nam-Soo Kim
    Genes & Genomics, 2017, 39 : 359 - 370
  • [40] Missing genes in the annotation of prokaryotic genomes
    Andrew S Warren
    Jeremy Archuleta
    Wu-chun Feng
    João Carlos Setubal
    BMC Bioinformatics, 11