ArrayUDF: User-Defined Scientific Data Analysis on Arrays

被引:17
|
作者
Dong, Bin [1 ]
Wu, Kesheng [1 ]
Byna, Surendra [1 ]
Liu, Jialin [1 ]
Zhao, Weijie [2 ]
Rusu, Florin [1 ,2 ]
机构
[1] Lawrence Berkeley Natl Lab, 1 Cyclotron Rd, Berkeley, CA 94720 USA
[2] Univ Calif, 5200 Lake Rd, Merced, CA 95343 USA
关键词
ArrayUDF; User-Defined Data Analysis; Array Structural Locality; SciDB; MapReduce; Spark;
D O I
10.1145/3078597.3078599
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
User-Defined Functions (UDF) allow application programmers to specify analysis operations on data, while leaving the data management tasks to the system. This general approach enables numerous custom analysis functions and is at the heart of the modern Big Data systems. Even though the UDF mechanism can theoretically support arbitrary operations, a wide variety of common operations - such as computing the moving average of a time series, the vorticity of a fluid flow, etc., - are hard to express and slow to execute. Since these operations are traditionally performed on multi-dimensional arrays, we propose to extend the expressiveness of structural locality for supporting UDF operations on arrays. We further propose an in situ UDF mechanism, called ArrayUDF, to implement the structural locality. ArrayUDF allows users to define computations on adjacent array cells without the use of join operations and executes the UDF directly on arrays stored in data files without requiring to load their content into a data management system. Additionally, we present a thorough theoretical analysis of the data access cost to exploit the structural locality, which enables ArrayUDF to automatically select the best array partitioning strategy for a given UDF operation. In a series of performance evaluations on large scientific datasets, we have observed that - using the generic UDF interface - ArrayUDF consistently outperforms Spark, SciDB, and RasDaMan.
引用
收藏
页码:53 / 64
页数:12
相关论文
共 50 条
  • [1] User-defined information and scientific performance evaluation
    Hoffman, JR
    Mahler, R
    Zajic, T
    [J]. SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION X, 2001, 4380 : 300 - 311
  • [2] Security and Ownership in User-Defined Data Meshes
    Pingos, Michalis
    Christodoulou, Panayiotis
    Andreou, Andreas S.
    [J]. ALGORITHMS, 2024, 17 (04)
  • [3] User-defined data types and operators in occam
    Wood, DC
    Moores, J
    [J]. ARCHITECTURES, LANGUAGES AND TECHNIQUES FOR CONCURRENT SYSTEMS, 1999, 57 : 121 - 146
  • [4] Adequacy of a User-Defined Vocabulary to the Data Structure
    Lesot, Marie-Jeanne
    Smits, Gregory
    Pivert, Olivier
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ - IEEE 2013), 2013,
  • [5] Supporting User-Defined Functions on Uncertain Data
    Tran, Thanh T. L.
    Diao, Yanlei
    Sutton, Charles
    Liu, Anna
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (06): : 469 - 480
  • [6] User-defined Instrument
    Ye, Weidong
    Du, Yongwei
    [J]. PROCEEDINGS OF THE 2016 JOINT INTERNATIONAL INFORMATION TECHNOLOGY, MECHANICAL AND ELECTRONIC ENGINEERING, 2016, 59 : 560 - 563
  • [7] On user-defined features
    Hoffmann, CM
    Joan-Arinyo, R
    [J]. COMPUTER-AIDED DESIGN, 1998, 30 (05) : 321 - 332
  • [8] Data Recovery of User-defined Procedures in Binary Translation
    Liu, Xiaonan
    Zhao, Rongcai
    Pang, Jianmin
    Yin, Meijuan
    Wei, Zhenfang
    [J]. PROCEEDINGS OF THE 2013 6TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2013), VOLS 1 AND 2, 2013, : 634 - 638
  • [9] Data redistribution using MPI user-defined types
    Yang, CS
    Bai, SW
    [J]. FIRST INTERNATIONAL SYMPOSIUM ON CYBER WORLDS, PROCEEDINGS, 2002, : 47 - 53
  • [10] Recognizing User-Defined Subsequences in Human Motion Data
    Sedmidubsky, Jan
    Zezula, Pavel
    [J]. ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 395 - 398