BackgroundDespite the many cheap and fast ways to generate genomic data, good and exact genome assembly is still a problem, with especially the repeats being vastly underrepresented and often misassembled. As short reads in low coverage are already sufficient to represent the repeat landscape of any given genome, many read cluster algorithms were brought forward that provide repeat identification and classification. But how can trustworthy, reliable and representative repeat consensuses be derived from unassembled genomes?ResultsHere, we combine methods from repeat identification and genome assembly to derive these robust consensuses. We test several use cases, such as (1) consensus building from clustered short reads of non-model genomes, (2) from genome-wide amplification setups, and (3) specific repeat-centred questions, such as the linked vs. unlinked arrangement of ribosomal genes. In all our use cases, the derived consensuses are robust and representative. To evaluate overall performance, we compare our high-fidelity repeat consensuses to RepeatExplorer2-derived contigs and check, if they represent real transposable elements as found in long reads. Our results demonstrate that it is possible to generate useful, reliable and trustworthy consensuses from short reads by a combination from read cluster and genome assembly methods in an automatable way.ConclusionWe anticipate that our workflow opens the way towards more efficient and less manual repeat characterization and annotation, benefitting all genome studies, but especially those of non-model organisms.
机构:
Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USADana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
Feng, Xiaowen
Cheng, Haoyu
论文数: 0引用数: 0
h-index: 0
机构:
Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USADana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
Cheng, Haoyu
Portik, Daniel
论文数: 0引用数: 0
h-index: 0
机构:
Pacific Biosci, Menlo Pk, CA USADana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
Portik, Daniel
Li, Heng
论文数: 0引用数: 0
h-index: 0
机构:
Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USADana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
机构:
Seoul Natl Univ, Inst Mol Biol & Genet, Seoul 08826, South Korea
Seoul Natl Univ, Dept Biol Sci, Seoul 08826, South KoreaSeoul Natl Univ, Inst Mol Biol & Genet, Seoul 08826, South Korea
Lee, Hyunji
论文数: 引用数:
h-index:
机构:
Kim, Jun
Lee, Junho
论文数: 0引用数: 0
h-index: 0
机构:
Seoul Natl Univ, Inst Mol Biol & Genet, Seoul 08826, South Korea
Seoul Natl Univ, Dept Biol Sci, Seoul 08826, South Korea
Seoul Natl Univ, Res Inst Basic Sci, Seoul 08826, South KoreaSeoul Natl Univ, Inst Mol Biol & Genet, Seoul 08826, South Korea
机构:
Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USAUniv Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
Bankevich, Anton
Bzikadze, Andrey V.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Program Bioinformat & Syst Biol, San Diego, CA 92103 USAUniv Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
Bzikadze, Andrey V.
Kolmogorov, Mikhail
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Santa Cruz, Dept Biomol Engn, Santa Cruz, CA 95064 USAUniv Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
Kolmogorov, Mikhail
Antipov, Dmitry
论文数: 0引用数: 0
h-index: 0
机构:
St Petersburg State Univ, Inst Translat Biomed, Ctr Algorithm Biotechnol, St Petersburg, RussiaUniv Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
Antipov, Dmitry
Pevzner, Pavel A.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USAUniv Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
机构:
NHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USANHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USA
Nurk, Sergey
Walenz, Brian P.
论文数: 0引用数: 0
h-index: 0
机构:
NHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USANHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USA
Walenz, Brian P.
Rhie, Arang
论文数: 0引用数: 0
h-index: 0
机构:
NHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USANHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USA
Rhie, Arang
Vollger, Mitchell R.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Sch Med, Seattle, WA 98195 USANHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USA
Vollger, Mitchell R.
Logsdon, Glennis A.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Sch Med, Seattle, WA 98195 USANHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USA
Logsdon, Glennis A.
Grothe, Robert
论文数: 0引用数: 0
h-index: 0
机构:
Pacific Biosci, Menlo Pk, CA 94025 USANHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USA
Grothe, Robert
Miga, Karen H.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Santa Cruz, UC Santa Cruz Genom Inst, Santa Cruz, CA 95064 USANHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USA
Miga, Karen H.
Eichler, Evan E.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Sch Med, Seattle, WA 98195 USA
Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USANHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USA
Eichler, Evan E.
Phillippy, Adam M.
论文数: 0引用数: 0
h-index: 0
机构:
NHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USANHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USA
Phillippy, Adam M.
Koren, Sergey
论文数: 0引用数: 0
h-index: 0
机构:
NHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USANHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20894 USA
机构:
Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Vollger, Mitchell R.
Logsdon, Glennis A.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Logsdon, Glennis A.
Audano, Peter A.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Audano, Peter A.
Sulovari, Arvis
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Sulovari, Arvis
Porubsky, David
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Porubsky, David
Peluso, Paul
论文数: 0引用数: 0
h-index: 0
机构:
Pacific Biosci Calif, Menlo Pk, CA USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Peluso, Paul
Wenger, Aaron M.
论文数: 0引用数: 0
h-index: 0
机构:
Pacific Biosci Calif, Menlo Pk, CA USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Wenger, Aaron M.
Concepcion, Gregory T.
论文数: 0引用数: 0
h-index: 0
机构:
Pacific Biosci Calif, Menlo Pk, CA USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Concepcion, Gregory T.
Kronenberg, Zev N.
论文数: 0引用数: 0
h-index: 0
机构:
Pacific Biosci Calif, Menlo Pk, CA USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Kronenberg, Zev N.
Munson, Katherine M.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Munson, Katherine M.
Baker, Carl
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Baker, Carl
Sanders, Ashley D.
论文数: 0引用数: 0
h-index: 0
机构:
European Mol Biol Lab, Genome Biol Unit, Heidelberg, GermanyUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Sanders, Ashley D.
Spierings, Diana C. J.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Groningen, Univ Med Ctr Groningen, European Res Inst Biol Ageing, Groningen, NetherlandsUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Spierings, Diana C. J.
Lansdorp, Peter M.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Groningen, Univ Med Ctr Groningen, European Res Inst Biol Ageing, Groningen, Netherlands
BC Canc Agcy, Terry Fox Lab, Vancouver, BC, Canada
Univ British Columbia, Dept Med Genet, Vancouver, BC, CanadaUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Lansdorp, Peter M.
Surti, Urvashi
论文数: 0引用数: 0
h-index: 0
机构:
Univ Pittsburgh, Sch Med, Dept Pathol, Pittsburgh, PA USA
Univ Pittsburgh, Med Ctr, Pittsburgh, PA USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Surti, Urvashi
Hunkapiller, Michael W.
论文数: 0引用数: 0
h-index: 0
机构:
Pacific Biosci Calif, Menlo Pk, CA USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Hunkapiller, Michael W.
Eichler, Evan E.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USAUniv Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA