High-fidelity (repeat) consensus sequences from short reads using combined read clustering and assembly

被引:3
|
作者
Mann, Ludwig [1 ]
Balasch, Kristin [1 ]
Schmidt, Nicola [1 ]
Heitkam, Tony [1 ,2 ]
机构
[1] Tech Univ Dresden, Fac Biol, D-01069 Dresden, Germany
[2] Karl Franzens Univ Graz, Inst Biol, NAWI Graz, A-8010 Graz, Austria
关键词
Repetitive DNA; Transposable elements; Consensus sequences; Repeat assembly; Repeat clustering; eccDNA; Ribosomal DNA; rDNA; Non-model organisms; MALE-FERTILE; GENOME; DNA; TRANSCRIPTION; PLANTS;
D O I
10.1186/s12864-023-09948-4
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundDespite the many cheap and fast ways to generate genomic data, good and exact genome assembly is still a problem, with especially the repeats being vastly underrepresented and often misassembled. As short reads in low coverage are already sufficient to represent the repeat landscape of any given genome, many read cluster algorithms were brought forward that provide repeat identification and classification. But how can trustworthy, reliable and representative repeat consensuses be derived from unassembled genomes?ResultsHere, we combine methods from repeat identification and genome assembly to derive these robust consensuses. We test several use cases, such as (1) consensus building from clustered short reads of non-model genomes, (2) from genome-wide amplification setups, and (3) specific repeat-centred questions, such as the linked vs. unlinked arrangement of ribosomal genes. In all our use cases, the derived consensuses are robust and representative. To evaluate overall performance, we compare our high-fidelity repeat consensuses to RepeatExplorer2-derived contigs and check, if they represent real transposable elements as found in long reads. Our results demonstrate that it is possible to generate useful, reliable and trustworthy consensuses from short reads by a combination from read cluster and genome assembly methods in an automatable way.ConclusionWe anticipate that our workflow opens the way towards more efficient and less manual repeat characterization and annotation, benefitting all genome studies, but especially those of non-model organisms.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] High-fidelity self-assembly of crystalline organic thin films by π-π stacking from a metal surface
    Skomski, Daniel
    Tempas, Christopher D.
    Jo, Junyong
    Lee, Dongwhan
    Tait, Steven L.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 247
  • [32] High-fidelity replication of short optical pulses in an optical fiber loop using a semiconductor optical amplifier
    Feng, D
    Li, Z
    Zheng, Z
    Chen, YX
    SEMICONDUCTOR LASERS AND APPLICATIONS II, 2004, 5628 : 102 - 112
  • [33] HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads
    Anas A. Al-okaily
    BMC Genomics, 17
  • [34] From Information to Simulation: Improving Competency in ECT Training Using High-Fidelity Simulation
    Raysin, Anetta
    Gillett, Brian
    Carmody, Joseph
    Goel, Nidhi
    McAfee, Scot
    Jacob, Theresa
    ACADEMIC PSYCHIATRY, 2018, 42 (05) : 653 - 658
  • [35] Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video
    Wu, Xiuzhe
    Hu, Pengfei
    Wu, Yang
    Lyu, Xiaoyang
    Cao, Yan-Pei
    Shan, Ying
    Yang, Wenming
    Sun, Zhongqian
    Qi, Xiaojuan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22111 - 22120
  • [36] From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
    Roman, Robin San
    Adi, Yossi
    Deleforge, Antoine
    Serizel, Romain
    Synnaeve, Gabriel
    Defossez, Alexandre
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Bioinformatics challenges in de novo transcriptome assembly using short read sequences in the absence of a reference genome sequence
    Gongora-Castillo, Elsa
    Buell, C. Robin
    NATURAL PRODUCT REPORTS, 2013, 30 (04) : 490 - 500
  • [38] From Information to Simulation: Improving Competency in ECT Training Using High-Fidelity Simulation
    Anetta Raysin
    Brian Gillett
    Joseph Carmody
    Nidhi Goel
    Scot McAfee
    Theresa Jacob
    Academic Psychiatry, 2018, 42 : 653 - 658
  • [39] Chromosome-Scale, Haplotype-Resolved Genome Assembly of Non-Sex-Reversal Females of Swamp Eel Using High-Fidelity Long Reads and Hi-C Data
    Tian, Hai-Feng
    Hu, Qiaomu
    Lu, Hong-Yi
    Li, Zhong
    FRONTIERS IN GENETICS, 2022, 13
  • [40] Improved genome assembly of the whiteleg shrimp Penaeus (Litopenaeus) vannamei using long- and short-read sequences from public databases
    Perez-Enriquez, Ricardo
    Juarez, Oscar E.
    Galindo-Torres, Pavel
    Vargas-Aguilar, Ana Luisa
    Llera-Herrera, Raul
    JOURNAL OF HEREDITY, 2024, 115 (03) : 302 - 310