Toward a new approach for sorting extremely large data files in the big data era

被引:0
|
作者
Ali Shatnawi
Yathrip AlZahouri
Mohammed A. Shehab
Yaser Jararweh
Mahmoud Al-Ayyoub
机构
[1] Jordan University of Science & Technology,
来源
Cluster Computing | 2019年 / 22卷
关键词
Big data; Sorting; External merge sort; Large file processing; Hybrid CPU–GPU;
D O I
暂无
中图分类号
学科分类号
摘要
The extensive amount of data and contents generated today will require a paradigm shift in processing and management techniques for these data. One of the important data processing operations is the data sorting. Using multiple passes in external merge sort has a great influence on speeding up the sorting of extremely large data files. Since in large files, the swapping time is dominant in many applications, algorithms that minimize the swapping operations are normally superior to those which only focus on CPU time optimizations. In sorting extremely large files, external algorithms, such as the merge sort, are normally used. It is shown that using multiple passes over the data set, as proposed in our algorithm, has resulted in a great improvement in the number of swaps, thus, reducing the overall sorting time. Moreover, the proposed technique is suitable to be used with the emerging parallelization techniques such as GPUs. The reported results show the superiority of the proposed technique for “CPU only” and hybrid CPU–GPU implementations.
引用
收藏
页码:819 / 828
页数:9
相关论文
共 50 条
  • [21] Small data in the era of big data
    Kitchin, Rob
    Lauriault, Tracey P.
    [J]. GEOJOURNAL, 2015, 80 (04) : 463 - 475
  • [22] The Big Data Sjogren Consortium: a project for a new data science era
    Acar-Denizli, N.
    Kostov, B.
    Ramos-Casals, M.
    [J]. CLINICAL AND EXPERIMENTAL RHEUMATOLOGY, 2019, 37 (03) : S19 - S23
  • [23] Toward a big data approach for indexing encrypted data in Cloud Computing
    Kaci, Abdellah
    Bouabana-Tebibel, Thouraya
    Rachedi, Abderrezak
    Yahiaoui, Chafia
    [J]. SECURITY AND PRIVACY, 2019, 2 (03)
  • [24] A New Approach for Integrating Data into Big Data Warehouse
    Hilali, Intissar
    Arfaoui, Nouha
    Ejbali, Ridha
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084
  • [25] LARGE SCIENTIFIC DATA FILES
    DILLON, L
    [J]. DATAMATION, 1967, 13 (09): : 63 - 66
  • [26] Large inserts for big data: artificial chromosomes in the genomic era
    Tocchetti, Arianna
    Donadio, Stefano
    Sosio, Margherita
    [J]. FEMS MICROBIOLOGY LETTERS, 2018, 365 (09)
  • [27] DATA PROTECTION IN AN ERA OF BIG DATA: THE CHALLENGES POSED BY BIG PERSONAL DATA
    Paterson, Moira
    McDonagh, Maeve
    [J]. MONASH UNIVERSITY LAW REVIEW, 2018, 44 (01): : 1 - 31
  • [28] Distributed Big Data Ingestion at Scale for Extremely Large Community of Users
    Tipparam, Venkat
    Liu, Belinda
    Chen, Yifei
    Lang, Zoe
    Ye, Gang
    Li, Diana
    Nguyen, Hong-Yen
    Lai, C. P.
    Chan, Steve
    [J]. BIG DATA - BIGDATA 2018, 2018, 10968 : 100 - 109
  • [29] Toward Efficient and Privacy-Preserving Computing in Big Data Era
    Lu, Rongxing
    Zhu, Hui
    Liu, Ximeng
    Liu, Joseph K.
    Shao, Jun
    [J]. IEEE NETWORK, 2014, 28 (04): : 46 - 50
  • [30] Sports Analytics in the Era of Big Data: Moving Toward the Next Frontier
    Assuncao, Renato
    Pelechrinis, Konstantinos
    [J]. BIG DATA, 2019, 7 (01) : 1 - 2