Toward a new approach for sorting extremely large data files in the big data era

被引:0
|
作者
Ali Shatnawi
Yathrip AlZahouri
Mohammed A. Shehab
Yaser Jararweh
Mahmoud Al-Ayyoub
机构
[1] Jordan University of Science & Technology,
来源
Cluster Computing | 2019年 / 22卷
关键词
Big data; Sorting; External merge sort; Large file processing; Hybrid CPU–GPU;
D O I
暂无
中图分类号
学科分类号
摘要
The extensive amount of data and contents generated today will require a paradigm shift in processing and management techniques for these data. One of the important data processing operations is the data sorting. Using multiple passes in external merge sort has a great influence on speeding up the sorting of extremely large data files. Since in large files, the swapping time is dominant in many applications, algorithms that minimize the swapping operations are normally superior to those which only focus on CPU time optimizations. In sorting extremely large files, external algorithms, such as the merge sort, are normally used. It is shown that using multiple passes over the data set, as proposed in our algorithm, has resulted in a great improvement in the number of swaps, thus, reducing the overall sorting time. Moreover, the proposed technique is suitable to be used with the emerging parallelization techniques such as GPUs. The reported results show the superiority of the proposed technique for “CPU only” and hybrid CPU–GPU implementations.
引用
收藏
页码:819 / 828
页数:9
相关论文
共 50 条
  • [31] Sports Analytics in the Era of Big Data: Moving Toward the Next Frontier
    Assuncao, Renato
    Pelechrinis, Konstantinos
    [J]. BIG DATA, 2018, 6 (04) : 237 - 238
  • [32] Statistical Data Analysis in the Era of Big Data
    Lengauer, Thomas
    [J]. CHEMIE INGENIEUR TECHNIK, 2020, 92 (07) : 831 - 841
  • [33] Teaching Data Mining in the Era of Big Data
    King, Brian R.
    Satyanarayana, Ashwin
    [J]. 2013 ASEE ANNUAL CONFERENCE, 2013,
  • [34] Personal Data Rights in the Era of Big Data
    Xiao, Cheng
    [J]. SOCIAL SCIENCES IN CHINA, 2019, 40 (03) : 174 - 188
  • [35] Process Data Analytics in the Era of Big Data
    Qin, S. Joe
    [J]. AICHE JOURNAL, 2014, 60 (09) : 3092 - 3100
  • [36] A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
    Canela-Xandri, Oriol
    Law, Andy
    Gray, Alan
    Woolliams, John A.
    Tenesa, Albert
    [J]. NATURE COMMUNICATIONS, 2015, 6
  • [37] Store, Schedule and Switch - A New Data Delivery Model in the Big Data Era
    Sun, Weiqiang
    Li, Fengqin
    Guo, Wei
    Jin, Yaohui
    Hu, Weisheng
    [J]. 2013 15TH INTERNATIONAL CONFERENCE ON TRANSPARENT OPTICAL NETWORKS (ICTON 2013), 2013,
  • [38] A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
    Oriol Canela-Xandri
    Andy Law
    Alan Gray
    John A. Woolliams
    Albert Tenesa
    [J]. Nature Communications, 6
  • [39] A New Approach for Missing Data Imputation in Big Data Interface
    Wang, Chunzhi
    Shakhovska, Nataliya
    Sachenko, Anatoliy
    Komar, Myroslav
    [J]. INFORMATION TECHNOLOGY AND CONTROL, 2020, 49 (04): : 541 - 555
  • [40] Epidemiology in the Era of Big Data
    Mooney, Stephen J.
    Westreich, Daniel J.
    El-Sayed, Abdulrahman M.
    [J]. EPIDEMIOLOGY, 2015, 26 (03) : 390 - 394