Movi: A fast and cache-efficient full-text pangenome index

被引:0
|
作者
Zakeri, Mohsen [1 ]
Brown, Nathaniel K. [1 ]
Ahmed, Omar Y. [1 ]
Gagie, Travis [2 ]
Langmead, Ben [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 4R2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
SEQUENCE;
D O I
10.1016/j.isci.2024.111464
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Pangenome indexes are promising tools for many applications, including classification of nanopore sequencing reads. Move structure is a compressed-index data structure based on the Burrows-Wheeler Transform (BWT). It offers simultaneous O(1)-time queries and O(r) space, where r is the number of BWT runs (consecutive sequence of identical characters). We developed Movi based on the move structure for indexing and querying pangenomes. Movi scales very well for repetitive text as its size grows strictly by r. Movi computes sophisticated matching queries for classification such as pseudo-matching lengths and backward search up to 30 times faster than existing methods by minimizing the number of cache misses and using memory prefetching to attain a degree of latency hiding. Movi's fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] An Index for Efficient Semantic Full-Text Search
    Bast, Hannah
    Buchhold, Bjoern
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 369 - 378
  • [2] Compact suffix array -: A space-efficient full-text index
    Mäkinen, V
    FUNDAMENTA INFORMATICAE, 2003, 56 (1-2) : 191 - 210
  • [3] An Efficient Approach for Building Compressed Full-text Index for Structured Data
    Liang, Jun
    Xiao, Lin
    Zhang, Di
    ICCIT: 2009 FOURTH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, : 59 - +
  • [4] Full-text Search Using Database Index
    Chaitanya, B. Sri Sai Krishna
    Reddy, D. Ajay Kumar
    Chandra, B. Pavan Sai Eshwar
    Krishna, A. Bala
    Menon, Remya R. K.
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [5] Building a distributed full-text index for the Web
    Melnik, S
    Raghavan, S
    Yang, B
    Garcia-Molina, H
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2001, 19 (03) : 217 - 241
  • [6] Cache-Efficient Approach for Index-Free Personalized PageRank
    Tsuchida, Kohei
    Matsumoto, Naoki
    Shin, Andrew
    Kaneko, Kunitake
    IEEE ACCESS, 2023, 11 : 6944 - 6957
  • [7] Bitlist: New Full-text Index for Low Space Cost and Efficient Keyword Search
    Rao, Weixiong
    Chen, Lei
    Hui, Pan
    Tarkoma, Sasu
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (13): : 1522 - 1533
  • [8] Canonical Huffman code based full-text index
    Yi Zhang a
    Progress in Natural Science, 2008, (03) : 325 - 330
  • [9] Cache-Efficient FM-Index Variants for Mapping of DNA Sequences
    Sitarcik, Jozef
    Lucka, Maria
    PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 1005 : 45 - 52
  • [10] Canonical Huffman code based full-text index
    Zhang, Yi
    Pei, Zhili
    Yang, Jinhui
    Liang, Yanchun
    PROGRESS IN NATURAL SCIENCE-MATERIALS INTERNATIONAL, 2008, 18 (03) : 325 - 330