Finding long tandem repeats in long noisy reads

被引:3
|
作者
Morishita, Shinichi [1 ]
Ichikawa, Kazuki [1 ]
Myers, Eugene W. [2 ,3 ]
机构
[1] Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol & Med Sci, Chiba 2778562, Japan
[2] Max Planck Inst Mol Cell Biol & Genet, D-01307 Dresden, Saxony, Germany
[3] Ctr Syst Biol Dresden, D-01307 Dresden, Saxony, Germany
关键词
FRAGILE-X; MYOTONIC-DYSTROPHY; HEXANUCLEOTIDE REPEAT; TRINUCLEOTIDE REPEAT; CTG REPEAT; EXPANSION; REGION; IDENTIFICATION; C9ORF72; FINDER;
D O I
10.1093/bioinformatics/btaa865
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Long tandem repeat expansions of more than 1000 nt have been suggested to be associated with diseases, but remain largely unexplored in individual human genomes because read lengths have been too short. However, new long-read sequencing technologies can produce single reads of 10 000 nt or more that can span such repeat expansions, although these long reads have high error rates, of 10-20%, which complicates the detection of repetitive elements. Moreover, most traditional algorithms for finding tandem repeats are designed to find short tandem repeats (< 1000 nt) and cannot effectively handle the high error rate of long reads in a reasonable amount of time. Results: Here, we report an efficient algorithm for solving this problem that takes advantage of the length of the repeat. Namely, a long tandem repeat has hundreds or thousands of approximate copies of the repeated unit, so despite the error rate, many short k-mers will be error-free in many copies of the unit. We exploited this characteristic to develop a method for first estimating regions that could contain a tandem repeat, by analyzing the k-mer frequency distributions of fixed-size windows across the target read, followed by an algorithm that assembles the k-mers of a putative region into the consensus repeat unit by greedily traversing a de Bruijn graph. Experimental results indicated that the proposed algorithm largely outperformed Tandem Repeats Finder, a widely used program for finding tandem repeats, in terms of sensitivity.
引用
收藏
页码:612 / 621
页数:10
相关论文
共 50 条
  • [31] S-conLSH: alignment-free gapped mapping of noisy long reads
    Chakraborty, Angana
    Morgenstern, Burkhard
    Bandyopadhyay, Sanghamitra
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [32] S-conLSH: alignment-free gapped mapping of noisy long reads
    Angana Chakraborty
    Burkhard Morgenstern
    Sanghamitra Bandyopadhyay
    BMC Bioinformatics, 22
  • [33] Hybrid de novo tandem repeat detection using short and long reads
    Guillaume Fertin
    Géraldine Jean
    Andreea Radulescu
    Irena Rusu
    BMC Medical Genomics, 8
  • [34] Hybrid de novo tandem repeat detection using short and long reads
    Fertin, Guillaume
    Jean, Geraldine
    Radulescu, Andreea
    Rusu, Irena
    BMC MEDICAL GENOMICS, 2015, 8
  • [35] ReviSTER: an automated pipeline to revise misaligned reads to simple tandem repeats
    Tae, Hongseok
    McMahon, Kevin W.
    Settlage, Robert E.
    Bavarva, Jasmin H.
    Garner, Harold R.
    BIOINFORMATICS, 2013, 29 (14) : 1734 - 1741
  • [36] Characterization of repeat arrays in ultra-long nanopore reads reveals frequent origin of satellite DNA from retrotransposon-derived tandem repeats
    Vondrak, Tihana
    Robledillo, Laura Avila
    Novak, Petr
    Koblizkova, Andrea
    Neumann, Pavel
    Macas, Jiri
    PLANT JOURNAL, 2020, 101 (02): : 484 - 500
  • [37] TRStalker: an efficient heuristic for finding fuzzy tandem repeats
    Pellegrini, Marco
    Renda, M. Elena
    Vecchio, Alessio
    BIOINFORMATICS, 2010, 26 (12) : i358 - i366
  • [38] Spectral techniques in finding DNA approximate tandem repeats
    Pop, Petre G.
    2006 IEEE-TTTC International Conference on Automation, Quality and Testing, Robotics, Vol 2, Proceedings, 2006, : 441 - 444
  • [39] Evolutionary trend of exceptionally long human core promoter short tandem repeats
    Ohadi, M.
    Mohammadparast, S.
    Darvish, H.
    GENE, 2012, 507 (01) : 61 - 67
  • [40] Long reads for a short plant
    Kellogg, Elizabeth A.
    NATURE PLANTS, 2015, 1 (12)