The Hopsworks Feature Store for Machine Learning

被引:0
|
作者
Martinez, Javier de la Rua [1 ,2 ]
Buso, Fabio [1 ]
Kouzoupis, Antonios [1 ]
Ormenisan, Alexandru A. [1 ]
Niazi, Salman [1 ]
Bzhalava, Davit [1 ]
Mak, Kenneth [1 ]
Jouffrey, Victor [1 ]
Ronstrom, Mikael [1 ]
Cunningham, Raymond [1 ]
Zangis, Ralfs [1 ]
Mukhedkar, Dhananjay [1 ]
Khazanchi, Ayushman [2 ]
Vlassov, Vladimir [2 ]
Dowling, Jim [1 ,2 ]
机构
[1] Hopsworks AB, Stockholm, Sweden
[2] KTH Royal Inst Technol, Stockholm, Sweden
基金
欧盟地平线“2020”;
关键词
Feature Store; MLOps; RonDB; Arrow Flight; DuckDB;
D O I
10.1145/3626246.3653389
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data management is the most challenging aspect of building Machine Learning (ML) systems. ML systems can read large volumes of historical data when training models, but inference workloads are more varied, depending on whether it is a batch or online ML system. The feature store for ML has recently emerged as a single data platform for managing ML data throughout the ML lifecycle, from feature engineering to model training to inference. In this paper, we present the Hopsworks feature store for machine learning as a highly available platform for managing feature data with API support for columnar, row-oriented, and similarity search query workloads. We introduce and address challenges solved by the feature stores related to feature reuse, how to organize data transformations, and how to ensure correct and consistent data between feature engineering, model training, and model inference. We present the engineering challenges in building high-performance query services for a feature store and show how Hopsworks outperforms existing cloud feature stores for training and online inference query workloads.
引用
收藏
页码:135 / 147
页数:13
相关论文
共 50 条
  • [1] A Review of Big Data and Machine Learning Operations in Official Statistics: MLOps and Feature Store Adoption
    Ramos Nunes, Carlos Eduardo
    Ashofteh, Afshin
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 711 - 718
  • [2] Store Separation Trajectory Clusters from Machine Learning
    Gothard, William D.
    Granlund, Kenneth O.
    JOURNAL OF AIRCRAFT, 2022, 59 (01): : 117 - 125
  • [3] Automated Feature Reduction in Machine Learning
    Shilane, David
    2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 45 - 49
  • [4] Research on Machine Learning Feature Algorithm
    Liu, Jing
    Yang, Ai
    Jiang, Wenbo
    2018 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2018), 2018, : 380 - 383
  • [5] Thermodynamics and feature extraction by machine learning
    Funai, Shotaro Shiba
    Giataganas, Dimitrios
    PHYSICAL REVIEW RESEARCH, 2020, 2 (03):
  • [6] Probabilistic Feature Selection in Machine Learning
    Ghosh, Indrajit
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2018, PT I, 2018, 10841 : 623 - 632
  • [7] Neurochaos feature transformation for Machine Learning
    Sethi, Deeksha
    Nagaraj, Nithin
    Harikrishnan, N. B.
    INTEGRATION-THE VLSI JOURNAL, 2023, 90 : 157 - 162
  • [8] Feature selection and feature learning in machine learning applications for gas turbines: A review
    Xie, Jiarui
    Sage, Manuel
    Zhao, Yaoyao Fiona
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [9] Machine Learning Feature Based Job Scheduling for Distributed Machine Learning Clusters
    Wang, Haoyu
    Liu, Zetian
    Shen, Haiying
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (01) : 58 - 73
  • [10] Maize Feature Store: A centralized resource to manage and analyze curated maize multi-omics features for machine learning applications
    Sen, Shatabdi
    Woodhouse, Margaret R.
    Portwood II, John L.
    Andorf, Carson M.
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2023, 2023