An Intermediate Representation for Optimizing Machine Learning Pipelines

被引:22
|
作者
Kunft, Andreas [1 ]
Katsifodimos, Asterios [2 ]
Schelter, Sebastian [3 ]
Bress, Sebastian [1 ,4 ]
Rabl, Tilmann [5 ]
Markl, Volker [1 ,4 ]
机构
[1] TU Berlin, Berlin, Germany
[2] Delft Univ Technol, Delft, Netherlands
[3] NYU, New York, NY 10003 USA
[4] DFKI, Kaiserslautern, Germany
[5] Univ Potsdam, HPI, Potsdam, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2019年 / 12卷 / 11期
关键词
SCALABLE LINEAR ALGEBRA; SYSTEMS; PLANS;
D O I
10.14778/3342263.3342633
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning (ML) pipelines for model training and validation typically include preprocessing, such as data cleaning and feature engineering, prior to training an ML model. Preprocessing combines relational algebra and user-defined functions (UDFs), while model training uses iterations and linear algebra. Current systems are tailored to either of the two. As a consequence, preprocessing and ML steps are optimized in isolation. To enable holistic optimization of ML training pipelines, we present Lara, a declarative domain-specific language for collections and matrices. Lara's intermediate representation (IR) reflects on the complete program, i.e., UDFs, control flow, and both data types. Two views on the IR enable diverse optimizations. Monads enable operator pushdown and fusion across type and loop boundaries. Combinators provide the semantics of domain-specific operators and optimize data access and cross-validation of ML algorithms. Our experiments on preprocessing pipelines and selected ML algorithms show the effects of our proposed optimizations on dense and sparse data, which achieve speedups of up to an order of magnitude.
引用
收藏
页码:1553 / 1567
页数:15
相关论文
共 50 条
  • [41] Optimizing Machine Learning Workloads in Collaborative Environments
    Derakhshan, Behrouz
    Mahdiraji, Alireza Rezaei
    Abedjan, Ziawasch
    Rabl, Tilmann
    Markl, Volker
    [J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 1701 - 1716
  • [42] Optimizing observables with machine learning for better unfolding
    Arratia, Miguel
    Britzger, Daniel
    Long, Owen
    Nachman, Benjamin
    [J]. JOURNAL OF INSTRUMENTATION, 2022, 17 (07)
  • [43] Optimizing the synergy between physics and machine learning
    不详
    [J]. NATURE MACHINE INTELLIGENCE, 2021, 3 (11) : 925 - 925
  • [44] Optimizing the synergy between physics and machine learning
    [J]. Nature Machine Intelligence, 2021, 3 : 925 - 925
  • [45] Optimizing seed inputs in fuzzing with machine learning
    Cheng, Liang
    Zhang, Yang
    Zhang, Yi
    Wu, Chen
    Li, Zhangtan
    Fu, Yu
    Li, Haisheng
    [J]. 2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2019), 2019, : 244 - 245
  • [46] Optimizing a Compressive Imager for Machine Learning Tasks
    Redman, Brian J.
    Calzada, Daniel
    Wing, Jamie
    Tu-Thach Quach
    Galiardi, Meghan
    Dagel, Amber L.
    LaCasse, Charles E.
    Birch, Gabriel C.
    [J]. CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 1000 - 1004
  • [47] Machine Learning Pipelines: Training, Deployment and Opportunities for Reconfigurable Hardware
    Becker, Jurgen
    Prasanna, Viktor K.
    [J]. 2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 81 - 81
  • [48] Automated evolutionary approach for the design of composite machine learning pipelines
    Nikitin, Nikolay O.
    Vychuzhanin, Pavel
    Sarafanov, Mikhail
    Polonskaia, Iana S.
    Revin, Ilia
    V. Barabanova, Irina
    Maximov, Gleb
    Kalyuzhnaya, Anna, V
    Boukhanovsky, Alexander
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 127 : 109 - 125
  • [49] Engineering Carbon Emission-aware Machine Learning Pipelines
    Husom, Erik Johannes
    Sen, Sagar
    Goknil, Arda
    [J]. PROCEEDINGS 2024 IEEE/ACM 3RD INTERNATIONAL CONFERENCE ON AI ENGINEERING-SOFTWARE ENGINEERING FOR AI, CAIN 2024, 2024, : 118 - 128
  • [50] A Machine Learning Approach for Big Data in Oil and Gas Pipelines
    Mohamed, Abduljalil
    Hamdi, Mohamed Salah
    Tahar, Sofiene
    [J]. 2015 3RD INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD) AND INTERNATIONAL CONFERENCE ON OPEN AND BIG (OBD), 2015, : 585 - 590