Data Management in Machine Learning Systems

被引:0
|
作者
Boehm, Matthias [1 ]
Kumar, Arun [2 ]
Yang, Jun [3 ]
机构
[1] Graz University of Technology, Austria
[2] University of California, San Diego, United States
[3] Duke University, United States
来源
Synthesis Lectures on Data Management | 2019年 / 11卷 / 01期
关键词
Information management;
D O I
10.2200/S00895ED1V01Y201901DTM057
中图分类号
学科分类号
摘要
Large-scale data analytics using machine learning (ML) underpins many modern data-driven applications. ML systems provide means of specifying and executing these ML workloads in an efficient and scalable manner. Data management is at the heart of many ML systems due to data-driven application characteristics, data-centric workload characteristics, and system architectures inspired by classical data management techniques. In this book, we follow this data-centric view of ML systems and aim to provide a comprehensive overview of data management in ML systems for the end-to-end data science or ML lifecycle. We review multiple interconnected lines of work: (1) ML support in database (DB) systems, (2) DB-inspired ML systems, and (3) ML lifecycle systems. Covered topics include: in-database analytics via query generation and user-defined functions, factorized and statistical-relational learning; optimizing compilers for ML workloads; execution strategies and hardware accelerators; data access methods such as compression, partitioning and indexing; resource elasticity and cloud markets; as well as systems for data preparation for ML, model selection, model management, model debugging, and model serving. Given the rapidly evolving field, we strive for a balance between an up-to-date survey of ML systems, an overview of the underlying concepts and techniques, as well as pointers to open research questions. Hence, this book might serve as a starting point for both systems researchers and developers. © 2019 by Morgan & Claypool.
引用
收藏
页码:1 / 173
相关论文
共 50 条
  • [41] Analysis of Quality Management Systems with the Use of Machine Learning Methods
    Dzedik, Valentin
    Ezrakhovich, Alex
    QUALITY-ACCESS TO SUCCESS, 2018, 19 (164): : 40 - 42
  • [42] Supervised Machine Learning for Power and Bandwidth Management in VHTS Systems
    Ortiz-Gomez, Flor G.
    Tarchi, Daniele
    Martinez Rodriguez-Osorio, Ramon
    Vanelli-Coralli, Alessandro
    Salas-Natera, Miguel A.
    Landeros-Ayala, Salvador
    2020 10TH ADVANCED SATELLITE MULTIMEDIA SYSTEMS CONFERENCE AND THE 16TH SIGNAL PROCESSING FOR SPACE COMMUNICATIONS WORKSHOP (ASMS/SPSC), 2020,
  • [43] An Automated Machine Learning Approach for Smart Waste Management Systems
    Rutqvist, David
    Kleyko, Denis
    Blomstedt, Fredrik
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (01) : 384 - 392
  • [44] Scoping review: Machine learning interventions in the management of healthcare systems
    Arueyingho, Oritsetimeyin, V
    Al-Taie, Anmar
    Mccallum, Claire
    DIGITAL HEALTH, 2024, 10
  • [45] Influence of Transfer Learning on Machine Learning Systems Robustness to Data Quality Degradation
    Chuprov, Sergei
    Khokhlov, Igor
    Reznik, Leon
    Shetty, Srujan
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [46] An Intelligent Data Analysis for Recommendation Systems Using Machine Learning
    Ramzan, Bushra
    Bajwa, Imran Sarwar
    Jamil, Noreen
    Ul Amin, Riaz
    Ramzan, Shabana
    Mirza, Farhan
    Sarwar, Nadeem
    SCIENTIFIC PROGRAMMING, 2019, 2019
  • [47] Data sniffing - Monitoring of machine learning for online adaptive systems
    Liu, Y
    Menzies, T
    Cukic, B
    14TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, : 16 - 21
  • [48] ArchNet: A data hiding design for distributed machine learning systems
    Chang, Kaiyan
    Jiang, Wei
    Zhan, Jinyu
    Gong, Zicheng
    Pan, Weijia
    JOURNAL OF SYSTEMS ARCHITECTURE, 2021, 114
  • [49] Revisiting Data Prefetching for Database Systems with Machine Learning Techniques
    Chen, Yu
    Zhang, Yong
    Wu, Jiacheng
    Wang, Jin
    Xing, Chunxiao
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2165 - 2170
  • [50] Fuzzy Neuro Systems for Machine Learning for Large Data Sets
    Kala, Rahul
    Shukla, Anupam
    Tiwari, Ritu
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 541 - 545