Toward a new approach for sorting extremely large data files in the big data era

被引:6
|
作者
Shatnawi, Ali [1 ]
AlZahouri, Yathrip [1 ]
Shehab, Mohammed A. [1 ]
Jararweh, Yaser [1 ]
Al-Ayyoub, Mahmoud [1 ]
机构
[1] Jordan Univ Sci & Technol, Box 3030, Irbid 22110, Jordan
关键词
Big data; Sorting; External merge sort; Large file processing; Hybrid CPU-GPU; ALGORITHMS;
D O I
10.1007/s10586-018-2860-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The extensive amount of data and contents generated today will require a paradigm shift in processing and management techniques for these data. One of the important data processing operations is the data sorting. Using multiple passes in external merge sort has a great influence on speeding up the sorting of extremely large data files. Since in large files, the swapping time is dominant in many applications, algorithms that minimize the swapping operations are normally superior to those which only focus on CPU time optimizations. In sorting extremely large files, external algorithms, such as the merge sort, are normally used. It is shown that using multiple passes over the data set, as proposed in our algorithm, has resulted in a great improvement in the number of swaps, thus, reducing the overall sorting time. Moreover, the proposed technique is suitable to be used with the emerging parallelization techniques such as GPUs. The reported results show the superiority of the proposed technique for "CPU only" and hybrid CPU-GPU implementations.
引用
收藏
页码:819 / 828
页数:10
相关论文
共 50 条
  • [1] Toward a new approach for sorting extremely large data files in the big data era
    Ali Shatnawi
    Yathrip AlZahouri
    Mohammed A. Shehab
    Yaser Jararweh
    Mahmoud Al-Ayyoub
    [J]. Cluster Computing, 2019, 22 : 819 - 828
  • [2] A Multi-Pass Algorithm for Sorting Extremely Large Data Files
    Shatnawi, Ali
    Alzahouri, Yathrip
    [J]. 2015 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2015, : 79 - 82
  • [3] SORTING LARGE DATA FILES ON POOMA
    BAUGSTO, BAW
    GREIPSLAND, JF
    KAMERBEEK, J
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1990, 457 : 536 - 547
  • [4] A New Sorting Approach for Big Data Set: Hit Sort
    Santra, Soumen
    Mondal, Rohit
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION, INSTRUMENTATION AND CONTROL (ICICIC), 2017,
  • [5] Continuing progress of spike sorting in the era of big data
    Carlson, David
    Carin, Lawrence
    [J]. CURRENT OPINION IN NEUROBIOLOGY, 2019, 55 : 90 - 96
  • [6] Business Intelligence and Marketing Insights in an Era of Big Data: The Q-sorting Approach
    Kim, Ki Youn
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2014, 8 (02): : 567 - 582
  • [7] Hybird cloud computing: A New Approach for Big Data Era
    Boonchieng, Ekkarat
    [J]. 2015 INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC), 2015,
  • [8] Data lake: a new ideology in big data era
    Khine, Pwint Phyu
    Wang, Zhao Shun
    [J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATION AND SENSOR NETWORK (WCSN 2017), 2018, 17
  • [9] A New Era for Big Data and Chromatography
    Vivo-Truyols, Gabriel
    [J]. LC GC EUROPE, 2017, 30 (11) : 615 - 616
  • [10] Weathering a New Era of Big Data
    Greengard, Samuel
    [J]. COMMUNICATIONS OF THE ACM, 2014, 57 (09) : 12 - 14