Scalable Suffix Sorting on a Multicore Machine

被引:6
|
作者
Xie, Jing Yi [1 ]
Nong, Ge [1 ]
Lao, Bin [2 ]
Xu, Wentao [1 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510275, Guangdong, Peoples R China
[2] Guangdong Univ Foreign Studies, Sch Informat Sci & Technol, Guangzhou 510420, Peoples R China
基金
中国国家自然科学基金;
关键词
Sorting; Random access memory; Indexes; Multicore processing; Arrays; Task analysis; Big Data; Suffix sorting; algorithm design; multicore computer; ARRAY CONSTRUCTION;
D O I
10.1109/TC.2020.2972546
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A number of methods have been proposed for suffix sorting on internal memory of RAM and external memory of hard disks. The current best results for suffix sorting on internal or external memory are achieved by several algorithms using the induced sorting (IS) method in various ways. While these algorithms are efficient, the internal ones are much different from those external in terms of the algorithm designs. A scalable IS method that can be applied for suffix sorting on both internal and external memory is highly desired. This article proposes a blockwise IS method to facilitate pipelined access on internal memory and sequential I/Os on external memory. The detailed algorithm of using this method for a 4-stage pipeline with multiple threads is described, where multiple threads are applied to parallelize not only the pipelined stages of consecutive blocks but also the tasks within each stage wherever possible. This algorithm is evaluated by our experiments on a set of realistic and artificial datasets to achieve better overall time and space performance than the existing best results from pSACAK, pDSS and pKS. Beside sorting suffixes on internal memory in linear time, the proposed method can be ported to external memory for sorting massive suffixes in linear I/O complexity.
引用
收藏
页码:1364 / 1375
页数:12
相关论文
共 50 条
  • [41] SMR: Scalable MapReduce for Multicore Systems
    Zhang, Yu
    Yu, Yufen
    Chen, Jiankang
    IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 684 - 691
  • [42] A Grammar Compression Algorithm based on Induced Suffix Sorting
    Nogueira Nunes, Daniel Saad
    Louza, Felipe A.
    Gog, Simon
    Ayala-Rincon, Mauricio
    Navarro, Gonzalo
    2018 DATA COMPRESSION CONFERENCE (DCC 2018), 2018, : 42 - 51
  • [43] Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision
    Petersen, Felix
    Borgelt, Christian
    Kuehne, Hilde
    Deussen, Oliver
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [44] Highly scalable computational algorithms on emerging parallel machine multicore architectures: development and implementation in CFD context
    Kannan, R.
    Harrand, V.
    Lee, M.
    Przekwas, A. J.
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2013, 73 (10) : 869 - 882
  • [45] Antisequential suffix sorting for BWT-based data compression
    Baron, D
    Bresler, Y
    IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (04) : 385 - 397
  • [46] Optimal suffix sorting and LCP array construction for constant alphabets
    Louza, Felipe A.
    Gog, Simon
    Telles, Guilherme P.
    INFORMATION PROCESSING LETTERS, 2017, 118 : 30 - 34
  • [47] Building and Checking Suffix Array Simultaneously by Induced Sorting Method
    Lao, Bin
    Wu, Yi
    Nong, Ge
    Chan, Wai Hong
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (04) : 756 - 765
  • [48] Suffix sorting via Shannon-Fano-Elias codes
    Adjeroh, Don
    Nan, Fei
    DCC: 2008 DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2008, : 502 - 502
  • [49] Scalable RNA Sequencing on Clusters of Multicore Processors
    Martinez, Hector
    Barrachina, Sergio
    Castillo, Maribel
    Tarraga, Joaquin
    Medina, Ignacio
    Dopazo, Joaquin
    Quintana-Orti, Enrique S.
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 190 - 195
  • [50] Scalable Analysis of Multicore Data Reuse and Sharing
    Pericas, Miquel
    Taura, Kenjiro
    Matsuoka, Satoshi
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, : 353 - 362