Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization

被引:8
|
作者
Gao, Zekai J. [1 ]
Pansare, Niketan [2 ]
Jermaine, Christopher [1 ]
机构
[1] Rice Univ, Houston, TX 77251 USA
[2] IBM Res Almaden, San Jose, CA 95120 USA
基金
美国国家科学基金会;
关键词
Large-scale machine learning; user-defined functions; declarative systems; join-and-co-group; MAP-REDUCE; ARCHITECTURE;
D O I
10.1109/TKDE.2018.2873325
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale optimization has become an important application for data management systems, particularly in the context of statistical machine learning. In this paper, we consider how one might implement the join-and-co-group pattern in the context of a fully declarative data processing system. The join-and-co-group pattern is ubiquitous in iterative, large-scale optimization. In the join-and-co-group pattern, a user-defined function g is parameterized with a data object x as well as the subset of the statistical model Theta(x) that applies to that object, so that g(x vertical bar Theta(x)) can be used to compute a partial update of the model. This is repeated for every x in the full data set X. All partial updates are then aggregated and used to perform a complete update of the model. The join-and-co-group pattern has several implementation challenges, including the potential for a massive blow-up in the size of a fully parameterized model. Thus, unless the correct physical execution plan be chosen for implementing the join-and-co-group pattern, it is easily possible to have an execution that takes a very long time or even fails to complete. In this paper, we carefully consider the alternatives for implementing the join-and-co-group pattern on top of a declarative system, as well as how the best alternative can be selected automatically. Our focus is on the SimSQL database system, which is an SQL-based system with special facilities for large-scale, iterative optimization. Since it is an SQL-based system with a query optimizer, those choices can be made automatically.
引用
收藏
页码:2079 / 2092
页数:14
相关论文
共 50 条
  • [21] Efficient Execution of User-Defined Functions in SQL Queries
    Foufoulas, Yannis
    Simitsis, Alkis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3874 - 3877
  • [22] User-Defined Financial Functions for MS SQL Server
    Gubalova, Jolana
    Medvedova, Petra
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (09) : 19 - 25
  • [23] YeSQL: Rich User-Defined Functions without the Overhead
    Foufoulas, Yannis
    Simitsis, Alkis
    Ioannidis, Yannis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (12): : 3730 - 3733
  • [24] User-defined functions in the Arden Syntax: An extension proposal
    Karadimas, Harry
    Ebrahiminia, Vahid
    Lepage, Eric
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2018, 92 : 103 - 110
  • [25] Sharing Queries with Nonequivalent User-defined Aggregate Functions
    Zhang, Chao
    Farouk, Toumani
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2024, 49 (02):
  • [26] Parameterizations for large-scale variational system identification using unconstrained optimization
    Dutra, Dimas Abreu Archanjo
    AUTOMATICA, 2025, 173
  • [27] Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning
    Yang, Zhuang
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [28] Machine Learning Based Graph Mining of Large-scale Network and Optimization
    Liu, Mingyue
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
  • [29] A Survey on Large-Scale Machine Learning
    Wang, Meng
    Fu, Weijie
    He, Xiangnan
    Hao, Shijie
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2574 - 2594
  • [30] A declarative model specification system allowing NeuroML to be extended with user-defined component types
    Robert Cannon
    Padraig Gleeson
    Sharon Crook
    R Angus Silver
    BMC Neuroscience, 13 (Suppl 1)