Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization

被引:8
|
作者
Gao, Zekai J. [1 ]
Pansare, Niketan [2 ]
Jermaine, Christopher [1 ]
机构
[1] Rice Univ, Houston, TX 77251 USA
[2] IBM Res Almaden, San Jose, CA 95120 USA
基金
美国国家科学基金会;
关键词
Large-scale machine learning; user-defined functions; declarative systems; join-and-co-group; MAP-REDUCE; ARCHITECTURE;
D O I
10.1109/TKDE.2018.2873325
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale optimization has become an important application for data management systems, particularly in the context of statistical machine learning. In this paper, we consider how one might implement the join-and-co-group pattern in the context of a fully declarative data processing system. The join-and-co-group pattern is ubiquitous in iterative, large-scale optimization. In the join-and-co-group pattern, a user-defined function g is parameterized with a data object x as well as the subset of the statistical model Theta(x) that applies to that object, so that g(x vertical bar Theta(x)) can be used to compute a partial update of the model. This is repeated for every x in the full data set X. All partial updates are then aggregated and used to perform a complete update of the model. The join-and-co-group pattern has several implementation challenges, including the potential for a massive blow-up in the size of a fully parameterized model. Thus, unless the correct physical execution plan be chosen for implementing the join-and-co-group pattern, it is easily possible to have an execution that takes a very long time or even fails to complete. In this paper, we carefully consider the alternatives for implementing the join-and-co-group pattern on top of a declarative system, as well as how the best alternative can be selected automatically. Our focus is on the SimSQL database system, which is an SQL-based system with special facilities for large-scale, iterative optimization. Since it is an SQL-based system with a query optimizer, those choices can be made automatically.
引用
收藏
页码:2079 / 2092
页数:14
相关论文
共 50 条
  • [1] User-defined Machine Learning Functions
    Herrmann, Markus
    Fiedler, Marc
    3RD INTERNATIONAL CONFERENCE ON ADVANCED RESEARCH METHODS AND ANALYTICS (CARMA 2020), 2020, : 337 - 337
  • [2] Optimization of Complex Dataflows with User-Defined Functions
    Rheinlaender, Astrid
    Leser, Ulf
    Graefe, Goetz
    ACM COMPUTING SURVEYS, 2017, 50 (03)
  • [3] Compressed Linear Algebra for Declarative Large-Scale Machine Learning
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    COMMUNICATIONS OF THE ACM, 2019, 62 (05) : 83 - 91
  • [4] Declarative knowledge extraction with iterative user-defined aggregates
    Giannotti, F
    Manco, G
    FLEXIBLE QUERY ANSWERING SYSTEMS: RECENT ADVANCES, 2001, : 435 - 444
  • [5] Optimization Methods for Large-Scale Machine Learning
    Bottou, Leon
    Curtis, Frank E.
    Nocedal, Jorge
    SIAM REVIEW, 2018, 60 (02) : 223 - 311
  • [6] Enforcing User-defined Management Logic in Large Scale Systems
    Perera, Srinath
    Gannon, Dennis
    2009 IEEE CONGRESS ON SERVICES (SERVICES-1 2009), VOLS 1 AND 2, 2009, : 243 - 250
  • [7] Analyzing large-scale Data Cubes with user-defined algorithms: A cloud-native approach
    Xu, Chen
    Du, Xiaoping
    Jian, Hongdeng
    Dong, Yi
    Qin, Wei
    Mu, Haowei
    Yan, Zhenzhen
    Zhu, Junjie
    Fan, Xiangtao
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 109
  • [8] Mastering user-defined conversion functions
    Meyers, Scott
    C/C++ Users Journal, 1995, 13 (08):
  • [9] Consolidation of Queries with User-Defined Functions
    Sousa, Marcelo
    Dillig, Isil
    Vytiniotis, Dimitrios
    Dillig, Thomas
    Gkantsidis, Christos
    ACM SIGPLAN NOTICES, 2014, 49 (06) : 554 - 564
  • [10] Optimization of queries with user-defined predicates
    Chaudhuri, S
    Shim, K
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 1999, 24 (02): : 177 - 228