Fast Scalable Selection Algorithms for Large Scale Data

被引:0
|
作者
Thompson, Lee Parnell [1 ]
Xu, Weijia [2 ]
Miranker, Daniel P. [1 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Texas Austin, Texas Adv Comp Ctr, Austin, TX 78712 USA
基金
美国国家卫生研究院;
关键词
Hadoop; Map Reduce; Selection Algorithms; Median Finding; PARALLEL SELECTION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Selection finding, and its most common form median finding, are used as a measure of central tendency for problems in biology, databases, and graphics. These problems often require selection finding as a subcomponent where it can be called many times, and as such speed is important. The Map/Reduce framework has been shown to be an important tool for creating scalable applications. There are a number of valid implementations of the selection algorithms inside of a Map/Reduce framework, certain of which are compared in this paper. However, as the volume of data increases, subtle theoretical algorithmic implementation differences can lead to significant differences in practical application. Therefore, an efficient and scalable selection finding method has the potential to provide general benefit to a number of applications. This paper compares algorithms that have been redesigned or created for the Map/Reduce framework for the purpose of selection finding, or, finding the k-th ranked element in an unordered set. This paper takes the concepts used from two existing selection algorithms and translates them into a novel method using the Map/Reduce framework with two variations. Each approach uses a different methodology to reduce the total amount of workload needed for a selection. All the algorithms are compared together for scalability and efficiency in a computing cluster environment with up to 256 processing cores. The results show that the methods proposed in this paper outperform several common alternatives in identifying medians with Hadoop, including using sorting, Pig, and BinMedian methods. Our implementations are also available upon request.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations
    Cichocki, Andrzej
    Phan, Anh-Huy
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2009, E92A (03) : 708 - 721
  • [42] Fast Algorithms for Large Scale Conditional 3D Prediction
    Bo, Liefeng
    Sminchisescu, Cristian
    Kanaujia, Atul
    Metaxas, Dimitris
    [J]. 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 1833 - 1840
  • [43] Large-Scale Characteristic Mode Analysis With Fast Multipole Algorithms
    Dai, Qi I.
    Wu, Junwei
    Gan, Hui
    Liu, Qin S.
    Chew, Weng Cho
    Sha, Wei E. I.
    [J]. IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2016, 64 (07) : 2608 - 2616
  • [44] Fast Eclat Algorithms Based on Minwise Hashing for Large Scale Transactions
    Zhang, Chunkai
    Tian, Panbo
    Zhang, Xudong
    Jiang, Zoe L.
    Yao, Lin
    Wang, Xuan
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (02): : 3948 - 3961
  • [45] Fast Unsupervised Projection for Large-Scale Data
    Wang, Jingyu
    Wang, Lin
    Nie, Feiping
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3634 - 3644
  • [46] Scalable and Fast Hierarchical Clustering of IoT Malware Using Active Data Selection
    He, Tianxiang
    Han, Chansu
    Takahashi, Takeshi
    Kijima, Shuji
    Takeuchi, Jun'ichi
    [J]. 2021 SIXTH INTERNATIONAL CONFERENCE ON FOG AND MOBILE EDGE COMPUTING (FMEC), 2021, : 120 - 125
  • [47] SCALABLE ALGORITHMS FOR LARGE AND DYNAMIC NETWORKS: REDUCING BIG DATA FOR SMALL COMPUTATIONS
    Saniee, Iraj
    [J]. BELL LABS TECHNICAL JOURNAL, 2015, 20 : 23 - 33
  • [48] A NOTE ON GENETIC ALGORITHMS FOR LARGE-SCALE FEATURE-SELECTION
    SIEDLECKI, W
    SKLANSKY, J
    [J]. PATTERN RECOGNITION LETTERS, 1989, 10 (05) : 335 - 347
  • [49] Accelerating Two Algorithms for Large-Scale Compound Selection on GPUs
    Liao, Quan
    Wang, Jibo
    Watson, Ian A.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (05) : 1017 - 1024
  • [50] On the scalability of genetic algorithms to very large-scale feature selection
    Moser, A
    Murty, MN
    [J]. REAL-WORLD APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, 2000, 1803 : 77 - 86