Movi: A fast and cache-efficient full-text pangenome index

被引:0
|
作者
Zakeri, Mohsen [1 ]
Brown, Nathaniel K. [1 ]
Ahmed, Omar Y. [1 ]
Gagie, Travis [2 ]
Langmead, Ben [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 4R2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
SEQUENCE;
D O I
10.1016/j.isci.2024.111464
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Pangenome indexes are promising tools for many applications, including classification of nanopore sequencing reads. Move structure is a compressed-index data structure based on the Burrows-Wheeler Transform (BWT). It offers simultaneous O(1)-time queries and O(r) space, where r is the number of BWT runs (consecutive sequence of identical characters). We developed Movi based on the move structure for indexing and querying pangenomes. Movi scales very well for repetitive text as its size grows strictly by r. Movi computes sophisticated matching queries for classification such as pseudo-matching lengths and backward search up to 30 times faster than existing methods by minimizing the number of cache misses and using memory prefetching to attain a degree of latency hiding. Movi's fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.
引用
收藏
页数:12
相关论文
共 50 条
  • [42] Building an Inverted Index at the DBMS Layer for Fast Full Text Search
    Truica, Ciprian-Octavian
    Radulescu, Florin
    Boicea, Alexandru
    CONTROL ENGINEERING AND APPLIED INFORMATICS, 2017, 19 (01): : 94 - 101
  • [43] BloomCast: Efficient Full-Text Retrieval over Unstructured P2Ps with Guaranteed Recall
    Chen, Hanhua
    Jin, Hai
    Luo, Xucheng
    Liu, Yunhao
    Ni, Lionel M.
    CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, 2009, : 52 - +
  • [44] PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing
    Papapetrou, Odysseas
    Siberski, Wolf
    Nejdl, Wolfgang
    COMPUTER NETWORKS, 2010, 54 (12) : 2019 - 2040
  • [45] HAPS: Supporting Effective and Efficient Full-Text P2P Search with Peer Dynamics
    Zu-Jie Ren
    Ke Chen
    Li-Dan Shou
    Gang Chen
    Yi-Jun Bei
    Xiao-Yan Li
    Journal of Computer Science and Technology, 2010, 25 : 482 - 498
  • [46] HAPS: Supporting Effective and Efficient Full-Text P2P Search with Peer Dynamics
    Ren, Zu-Jie
    Chen, Ke
    Shou, Li-Dan
    Chen, Gang
    Bei, Yi-Jun
    Li, Xiao-Yan
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2010, 25 (03) : 482 - 498
  • [47] Experimental simulation on incremental three-gram index for two-gram full-text search systems
    Yamamoto, H
    Ohmi, S
    Tsuji, H
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 4846 - 4851
  • [49] A Novel Hash-Based Streaming Scheme for Energy Efficient Full-Text Search in Wireless Data Broadcast
    Yang, Kai
    Shi, Yan
    Wu, Weili
    Gao, Xiaofeng
    Zhong, Jiaofei
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, 2011, 6587 : 372 - +
  • [50] TopX 2.0 at the INEX 2008 Efficiency Track A (Very) Fast Object-Store for Top-k-Style XML Full-Text Search
    Theobald, Martin
    AbuJarour, Mohammed
    Schenkel, Ralf
    ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 224 - +