Ultra-fast genome comparison for large-scale genomic experiments

被引:25
|
作者
Perez-Wohlfeil, Esteban [1 ]
Diaz-del-Pino, Sergio [1 ]
Trelles, Oswaldo [1 ]
机构
[1] Univ Malaga, Comp Architecture Dept, Inst Invest Biomed Malaga IBIMA, Malaga, Spain
关键词
GENERATION;
D O I
10.1038/s41598-019-46773-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the last decade, a technological shift in the bioinformatics field has occurred: larger genomes can now be sequenced quickly and cost effectively, resulting in the computational need to efficiently compare large and abundant sequences. Furthermore, detecting conserved similarities across large collections of genomes remains a problem. The size of chromosomes, along with the substantial amount of noise and number of repeats found in DNA sequences (particularly in mammals and plants), leads to a scenario where executing and waiting for complete outputs is both time and resource consuming. Filtering steps, manual examination and annotation, very long execution times and a high demand for computational resources represent a few of the many difficulties faced in large genome comparisons. In this work, we provide a method designed for comparisons of considerable amounts of very long sequences that employs a heuristic algorithm capable of separating noise and repeats from conserved fragments in pairwise genomic comparisons. We provide software implementation that computes in linear time using one core as a minimum and a small, constant memory footprint. The method produces both a previsualization of the comparison and a collection of indices to drastically reduce computational complexity when performing exhaustive comparisons. Last, the method scores the comparison to automate classification of sequences and produces a list of detected synteny blocks to enable new evolutionary studies.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Ultra-fast genome comparison for large-scale genomic experiments
    Esteban Pérez-Wohlfeil
    Sergio Diaz-del-Pino
    Oswaldo Trelles
    [J]. Scientific Reports, 9
  • [2] Fast algorithms for large-scale genome alignment and comparison
    Delcher, AL
    Phillippy, A
    Carlton, J
    Salzberg, SL
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (11) : 2478 - 2483
  • [3] MADOKA: an ultra-fast approach for large-scale protein structure similarity searching
    Lei Deng
    Guolun Zhong
    Chenzhe Liu
    Judong Luo
    Hui Liu
    [J]. BMC Bioinformatics, 20
  • [4] LUSIPHER Large-scale Ultra-fast SIngle PHoto-Electron trackeR
    Dominjon, A.
    Chabanat, E.
    Depasse, P.
    Barbier, R.
    Baudot, J.
    Dulinski, W.
    Dorokhov, A.
    Winter, M.
    [J]. 2009 IEEE NUCLEAR SCIENCE SYMPOSIUM CONFERENCE RECORD, VOLS 1-5, 2009, : 1527 - +
  • [5] MADOKA: an ultra-fast approach for large-scale protein structure similarity searching
    Deng, Lei
    Zhong, Guolun
    Liu, Chenzhe
    Luo, Judong
    Liu, Hui
    [J]. BMC BIOINFORMATICS, 2019, 20 (01)
  • [6] A new ultra-fast PCA method for large scale data
    Li, Zilong
    Albrechtsen, Anders
    Meisner, Jonas
    [J]. HUMAN HEREDITY, 2022, VOL. (SUPPL 1) : 10 - 10
  • [7] ON THE APPLICATION OF ULTRA-FAST RARE EXPERIMENTS
    NORRIS, DG
    BORNERT, P
    REESE, T
    LEIBFRITZ, D
    [J]. MAGNETIC RESONANCE IN MEDICINE, 1992, 27 (01) : 142 - 164
  • [8] EXPERIMENTS USING ULTRA-FAST PULSE TECHNIQUES
    GOLDRING, G
    [J]. NUCLEAR INSTRUMENTS & METHODS, 1961, 11 (01): : 29 - 38
  • [9] Large-Scale Comparison Analysis of Genome Sequences
    Tang Haixu
    Ding Dafu(Shanghai Institute of Biochemistry
    [J]. 生物数学学报, 1997, (02) : 97 - 103
  • [10] GenMap: ultra-fast computation of genome mappability
    Pockrandt, Christopher
    Alzamel, Mai
    Iliopoulos, Costas S.
    Reinert, Knut
    [J]. BIOINFORMATICS, 2020, 36 (12) : 3687 - 3692