Data Management in Machine Learning Systems

被引:0
|
作者
Boehm, Matthias [1 ]
Kumar, Arun [2 ]
Yang, Jun [3 ]
机构
[1] Graz University of Technology, Austria
[2] University of California, San Diego, United States
[3] Duke University, United States
来源
Synthesis Lectures on Data Management | 2019年 / 11卷 / 01期
关键词
Information management;
D O I
10.2200/S00895ED1V01Y201901DTM057
中图分类号
学科分类号
摘要
Large-scale data analytics using machine learning (ML) underpins many modern data-driven applications. ML systems provide means of specifying and executing these ML workloads in an efficient and scalable manner. Data management is at the heart of many ML systems due to data-driven application characteristics, data-centric workload characteristics, and system architectures inspired by classical data management techniques. In this book, we follow this data-centric view of ML systems and aim to provide a comprehensive overview of data management in ML systems for the end-to-end data science or ML lifecycle. We review multiple interconnected lines of work: (1) ML support in database (DB) systems, (2) DB-inspired ML systems, and (3) ML lifecycle systems. Covered topics include: in-database analytics via query generation and user-defined functions, factorized and statistical-relational learning; optimizing compilers for ML workloads; execution strategies and hardware accelerators; data access methods such as compression, partitioning and indexing; resource elasticity and cloud markets; as well as systems for data preparation for ML, model selection, model management, model debugging, and model serving. Given the rapidly evolving field, we strive for a balance between an up-to-date survey of ML systems, an overview of the underlying concepts and techniques, as well as pointers to open research questions. Hence, this book might serve as a starting point for both systems researchers and developers. © 2019 by Morgan & Claypool.
引用
收藏
页码:1 / 173
相关论文
共 50 条
  • [1] Data Management in Machine Learning: Challenges, Techniques, and Systems
    Kumar, Arun
    Boehm, Matthias
    Yang, Jun
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1717 - 1722
  • [2] Dynamic Data Management for Machine Learning in Embedded Systems: A Case Study
    Ouhaichi, Hamza
    Olsson, Helena Holmstrom
    Bosch, Jan
    SOFTWARE BUSINESS (ICSOB 2019), 2019, 370 : 145 - 154
  • [3] Data Management for Machine Learning: A Survey
    Chai, Chengliang
    Wang, Jiayi
    Luo, Yuyu
    Niu, Zeping
    Li, Guoliang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 4646 - 4667
  • [4] Machine Learning for Data Management: A System View
    Li, Guoliang
    Zhou, Xuanhe
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 3198 - 3201
  • [5] Machine Learning for Data Management: Problems and Solutions
    Domingos, Pedro
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 629 - 629
  • [6] Survey on Data Management Technology for Machine Learning
    Cui J.-W.
    Zhao Z.
    Du X.-Y.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (03): : 604 - 621
  • [7] MACHINE LEARNING AND PLANNING FOR DATA MANAGEMENT IN FORESTRY
    MATWIN, S
    CHARLEBOIS, D
    GOODENOUGH, DG
    BHOGAL, P
    IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1995, 10 (06): : 35 - 41
  • [8] Data Management Challenges in Production Machine Learning
    Polyzotis, Neoklis
    Roy, Sudip
    Whang, Steven Euijong
    Zinkevich, Martin
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1723 - 1726
  • [9] Machine Learning to Data Management: A Round Trip
    Berti-Equille, Laure
    Bonifati, Angela
    Milo, Tova
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1735 - 1738
  • [10] Novel Machine Learning for Big Data Analytics in Intelligent Support Information Management Systems
    Lv, Zhihan
    Lou, Ranran
    Feng, Hailin
    Chen, Dongliang
    Lv, Haibin
    ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2022, 13 (01)