Movi: A fast and cache-efficient full-text pangenome index

被引:0
|
作者
Zakeri, Mohsen [1 ]
Brown, Nathaniel K. [1 ]
Ahmed, Omar Y. [1 ]
Gagie, Travis [2 ]
Langmead, Ben [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 4R2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
SEQUENCE;
D O I
10.1016/j.isci.2024.111464
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Pangenome indexes are promising tools for many applications, including classification of nanopore sequencing reads. Move structure is a compressed-index data structure based on the Burrows-Wheeler Transform (BWT). It offers simultaneous O(1)-time queries and O(r) space, where r is the number of BWT runs (consecutive sequence of identical characters). We developed Movi based on the move structure for indexing and querying pangenomes. Movi scales very well for repetitive text as its size grows strictly by r. Movi computes sophisticated matching queries for classification such as pseudo-matching lengths and backward search up to 30 times faster than existing methods by minimizing the number of cache misses and using memory prefetching to attain a degree of latency hiding. Movi's fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Partial index replicated and distributed scheme for full-text search on wireless broadcast
    VIKAS GOEL
    ANIL KUMAR AHLAWAT
    M N GUPTA
    Sadhana, 2015, 40 : 2129 - 2142
  • [22] Index versus full-text search: A usability study of user preference and performance
    Barnum, C
    Henderson, E
    Hood, A
    Jordan, R
    TECHNICAL COMMUNICATION, 2004, 51 (02) : 185 - 206
  • [23] Partial index replicated and distributed scheme for full-text search on wireless broadcast
    Goel, Vikas
    Ahlawat, Anil Kumar
    Gupta, M. N.
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2015, 40 (07): : 2129 - 2142
  • [24] Efficient Extraction of Protein-Protein Interactions from Full-Text Articles
    Hakenberg, Joerg
    Leaman, Robert
    Vo, Nguyen Ha
    Jonnalagadda, Siddhartha
    Sullivan, Ryan
    Miller, Christopher
    Tari, Luis
    Baral, Chitta
    Gonzalez, Graciela
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (03) : 481 - 494
  • [25] Structural optimization of a full-text n-gram index using relational normalization
    Kim, Min-Soo
    Whang, Kyu-Young
    Lee, Jae-Gil
    Lee, Min-Jae
    VLDB JOURNAL, 2008, 17 (06): : 1485 - 1507
  • [26] The journal download immediacy index (DII): experiences using a Chinese full-text database
    Jin-kun Wan
    Ping-huan Hua
    Ronald Rousseau
    Xiu-kun Sun
    Scientometrics, 2010, 82 : 555 - 566
  • [27] The journal download immediacy index (DII): experiences using a Chinese full-text database
    Wan, Jin-kun
    Hua, Ping-huan
    Rousseau, Ronald
    Sun, Xiu-kun
    SCIENTOMETRICS, 2010, 82 (03) : 555 - 566
  • [28] Fast and Exact Nearest Neighbor Search in Hamming Space on Full-Text Search Engines
    Mu, Cun
    Zhao, Jun
    Yang, Guang
    Yang, Binwei
    Yan, Zheng
    SIMILARITY SEARCH AND APPLICATIONS (SISAP 2019), 2019, 11807 : 49 - 56
  • [29] A Fast Appearance-Based Full-Text Search Method for Historical Newspaper Images
    Terasawa, Kengo
    Shima, Takahiro
    Kawashima, Toshio
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1379 - 1383
  • [30] Structural optimization of a full-text n-gram index using relational normalization
    Min-Soo Kim
    Kyu-Young Whang
    Jae-Gil Lee
    Min-Jae Lee
    The VLDB Journal, 2008, 17 : 1485 - 1507