Movi: A fast and cache-efficient full-text pangenome index

被引:0
|
作者
Zakeri, Mohsen [1 ]
Brown, Nathaniel K. [1 ]
Ahmed, Omar Y. [1 ]
Gagie, Travis [2 ]
Langmead, Ben [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 4R2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
SEQUENCE;
D O I
10.1016/j.isci.2024.111464
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Pangenome indexes are promising tools for many applications, including classification of nanopore sequencing reads. Move structure is a compressed-index data structure based on the Burrows-Wheeler Transform (BWT). It offers simultaneous O(1)-time queries and O(r) space, where r is the number of BWT runs (consecutive sequence of identical characters). We developed Movi based on the move structure for indexing and querying pangenomes. Movi scales very well for repetitive text as its size grows strictly by r. Movi computes sophisticated matching queries for classification such as pseudo-matching lengths and backward search up to 30 times faster than existing methods by minimizing the number of cache misses and using memory prefetching to attain a degree of latency hiding. Movi's fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] INDEX-FREE FULL-TEXT SEARCH FUNCTION INSTALLED INTO OPTICAL DISK DRIVE
    MURAI, K
    AZUMATANI, Y
    TAKAGI, Y
    FUKUSIMA, Y
    SATOH, I
    JAPANESE JOURNAL OF APPLIED PHYSICS PART 1-REGULAR PAPERS SHORT NOTES & REVIEW PAPERS, 1992, 31 (2B): : 680 - 687
  • [32] Index-free full-text search function installed into optical disk drive
    Murai, Katsumi
    Azumatani, Yasushi
    Takagi, Yuji
    Fukusima, Yoshihisa
    Satoh, Isao
    Japanese Journal of Applied Physics, Part 1: Regular Papers and Short Notes and Review Papers, 1992, 31 (2 B): : 680 - 687
  • [33] Energy Efficient Distributed Indexing Scheme for Full-Text Search on Multi channel Broadcast
    Goel, Vikas
    Ahlawat, Anil Kumar
    Gupta, Amit Kr.
    2013 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2013), 2013, : 664 - 669
  • [34] Efficient Indexing of Regional Maximum Activations of Convolutions using Full-Text Search Engines
    Amato, Giuseppe
    Carrara, Fabio
    Falchi, Fabrizio
    Gennaro, Claudio
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 425 - 428
  • [35] Proximity Scoring Using Sentence-Based Inverted Index for Practical Full-Text Search
    Uematsu, Yukio
    Inoue, Takafumi
    Fujioka, Kengo
    Kataoka, Ryoji
    Ohwada, Hayato
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2008, 5173 : 308 - +
  • [36] A compact memory space of dynamic full-text search using Bi-gram index
    Atlam, ES
    Ghada, EM
    Fuketa, M
    Morita, K
    Aoe, J
    ISCC2004: NINTH INTERNATIONAL SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2004, : 104 - 109
  • [37] E2FM: an encrypted and compressed full-text index for collections of genomic sequences
    Montecuollo, Ferdinando
    Schmid, Giovannni
    Tagliaferri, Roberto
    BIOINFORMATICS, 2017, 33 (18) : 2808 - 2817
  • [38] Computationally efficient approximation of a probabilistic model for document representation in the WEBSOM full-text analysis method
    Kaski, S
    NEURAL PROCESSING LETTERS, 1997, 5 (02) : 139 - 151
  • [39] BloomCast: Efficient and Effective Full-Text Retrieval in Unstructured P2P Networks
    Chen, Hanhua
    Jin, Hai
    Luo, Xucheng
    Liu, Yunhao
    Gu, Tao
    Chen, Kaiji
    Ni, Lionel M.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2012, 23 (02) : 232 - 241
  • [40] Energy- and Latency-Efficient Processing of Full-Text Searches on a Wireless Broadcast Stream
    Chung, Yon Dohn
    Yoo, Sanghyun
    Kim, Myoung Ho
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (02) : 207 - 218