How to Find Long Maximal Exact Matches and Ignore Short Ones

被引:0
|
作者
Gagie, Travis [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Halifax, NS, Canada
来源
DEVELOPMENTS IN LANGUAGE THEORY, DLT 2024 | 2024年 / 14791卷
基金
加拿大自然科学与工程研究理事会;
关键词
Maximal exact matches; pangenomics; Burrows-Wheeler Transform; grammar-based compression;
D O I
10.1007/978-3-031-66159-4_10
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Finding maximal exact matches (MEMs) between strings is an important task in bioinformatics, but it is becoming increasingly challenging as geneticists switch to pangenomic references. Fortunately, we are usually interested only in the relatively few MEMs that are longer than we would expect by chance. In this paper we show that under reasonable assumptions we can find all MEMs of length at least L between a pattern of length m and a text of length n in O(m) time plus extra O(log n) time only for each MEM of length at least nearly L using a compact index for the text, suitable for pangenomics.
引用
收藏
页码:131 / 140
页数:10
相关论文
共 50 条
  • [1] Extracting Maximal Exact Matches on GPU
    Abu-Doleh, Anas
    Kaya, Kamer
    Abouelhoda, Mohamed
    Catalyurek, Umit V.
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 1418 - 1427
  • [2] Finding maximal exact matches in graphs
    Rizzo, Nicola
    Caceres, Manuel
    Makinen, Veli
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2024, 19 (01)
  • [3] Finding maximal exact matches in graphs
    Nicola Rizzo
    Manuel Cáceres
    Veli Mäkinen
    Algorithms for Molecular Biology, 19
  • [4] Chaining of Maximal Exact Matches in Graphs
    Rizzo, Nicola
    Caceres, Manuel
    Makinen, Veli
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2023, 2023, 14240 : 353 - 366
  • [5] Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads
    Ahmed, Nauman
    Bertels, Koen
    Al-Ars, Zaid
    PROCEEDINGS OF 2020 10TH INTERNATIONAL CONFERENCE ON BIOSCIENCE, BIOCHEMISTRY AND BIOINFORMATICS (ICBBB 2020), 2020, : 28 - 34
  • [6] Jabba: Hybrid Error Correction for Long Sequencing Reads Using Maximal Exact Matches
    Miclotte, Giles
    Heydari, Mahdi
    Demeester, Piet
    Audenaert, Pieter
    Fostier, Jan
    ALGORITHMS IN BIOINFORMATICS (WABI 2015), 2015, 9289 : 175 - 188
  • [7] MONI: A Pangenomic Index for Finding Maximal Exact Matches
    Rossi, Massimiliano
    Oliva, Marco
    Langmead, Ben
    Gagie, Travis
    Boucher, Christina
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (02) : 169 - 187
  • [8] Faster Maximal Exact Matches with Lazy LCP Evaluation
    Goga, Adrian
    Depuydt, Lore
    Brown, Nathaniel K.
    Fostier, Jan
    Gagie, Travis
    Navarro, Gonzalo
    2024 DATA COMPRESSION CONFERENCE, DCC, 2024, : 123 - 132
  • [9] Practical Distributed Computation of Maximal Exact Matches in the Cloud
    El-Din, Sondos Seif
    Aboelhoda, Mohamed
    2014 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI), 2014, : 609 - 613
  • [10] Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches
    Almutairy, Meznah
    Torng, Eric
    PLOS ONE, 2018, 13 (02):