Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization

被引:8
|
作者
Gao, Zekai J. [1 ]
Pansare, Niketan [2 ]
Jermaine, Christopher [1 ]
机构
[1] Rice Univ, Houston, TX 77251 USA
[2] IBM Res Almaden, San Jose, CA 95120 USA
基金
美国国家科学基金会;
关键词
Large-scale machine learning; user-defined functions; declarative systems; join-and-co-group; MAP-REDUCE; ARCHITECTURE;
D O I
10.1109/TKDE.2018.2873325
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale optimization has become an important application for data management systems, particularly in the context of statistical machine learning. In this paper, we consider how one might implement the join-and-co-group pattern in the context of a fully declarative data processing system. The join-and-co-group pattern is ubiquitous in iterative, large-scale optimization. In the join-and-co-group pattern, a user-defined function g is parameterized with a data object x as well as the subset of the statistical model Theta(x) that applies to that object, so that g(x vertical bar Theta(x)) can be used to compute a partial update of the model. This is repeated for every x in the full data set X. All partial updates are then aggregated and used to perform a complete update of the model. The join-and-co-group pattern has several implementation challenges, including the potential for a massive blow-up in the size of a fully parameterized model. Thus, unless the correct physical execution plan be chosen for implementing the join-and-co-group pattern, it is easily possible to have an execution that takes a very long time or even fails to complete. In this paper, we carefully consider the alternatives for implementing the join-and-co-group pattern on top of a declarative system, as well as how the best alternative can be selected automatically. Our focus is on the SimSQL database system, which is an SQL-based system with special facilities for large-scale, iterative optimization. Since it is an SQL-based system with a query optimizer, those choices can be made automatically.
引用
收藏
页码:2079 / 2092
页数:14
相关论文
共 50 条
  • [31] CFD modelling of radiators in buildings with user-defined wall functions
    Risberg, Daniel
    Risberg, Mikael
    Westerlund, Lars
    APPLIED THERMAL ENGINEERING, 2016, 94 : 266 - 273
  • [32] Optimization of multi-site nicking mutagenesis for generation of large, user-defined combinatorial libraries
    Kirby, Monica B.
    Medina-Cucurella, Angelica, V
    Baumer, Zachary T.
    Whitehead, Timothy A.
    PROTEIN ENGINEERING DESIGN & SELECTION, 2021, 34 (34):
  • [33] INCREMENTAL MAJORIZATION-MINIMIZATION OPTIMIZATION WITH APPLICATION TO LARGE-SCALE MACHINE LEARNING
    Mairal, Julien
    SIAM JOURNAL ON OPTIMIZATION, 2015, 25 (02) : 829 - 855
  • [34] Machine Learning for Large-Scale Optimization in 6G Wireless Networks
    Shi, Yandong
    Lian, Lixiang
    Shi, Yuanming
    Wang, Zixin
    Zhou, Yong
    Fu, Liqun
    Bai, Lin
    Zhang, Jun
    Zhang, Wei
    IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2023, 25 (04): : 2088 - 2132
  • [35] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
  • [36] Large-scale kernel extreme learning machine
    Deng, Wan-Yu
    Zheng, Qing-Hua
    Chen, Lin
    Jisuanji Xuebao/Chinese Journal of Computers, 2014, 37 (11): : 2235 - 2246
  • [37] Machine learning for large-scale MOF screening
    Coupry, Damien
    Groot, Laurens
    Addicoat, Matthew
    Heine, Thomas
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
  • [38] Robust Large-Scale Machine Learning in the Cloud
    Rendle, Steffen
    Fetterly, Dennis
    Shekita, Eugene J.
    Su, Bor-yiing
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1125 - 1134
  • [39] Large-scale Machine Learning over Graphs
    Yang, Yiming
    PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9
  • [40] Large-Scale Machine Learning and Neuroimaging in Psychiatry
    Thompson, Paul
    BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S51 - S51