The Hopsworks Feature Store for Machine Learning

被引:0
|
作者
Martinez, Javier de la Rua [1 ,2 ]
Buso, Fabio [1 ]
Kouzoupis, Antonios [1 ]
Ormenisan, Alexandru A. [1 ]
Niazi, Salman [1 ]
Bzhalava, Davit [1 ]
Mak, Kenneth [1 ]
Jouffrey, Victor [1 ]
Ronstrom, Mikael [1 ]
Cunningham, Raymond [1 ]
Zangis, Ralfs [1 ]
Mukhedkar, Dhananjay [1 ]
Khazanchi, Ayushman [2 ]
Vlassov, Vladimir [2 ]
Dowling, Jim [1 ,2 ]
机构
[1] Hopsworks AB, Stockholm, Sweden
[2] KTH Royal Inst Technol, Stockholm, Sweden
基金
欧盟地平线“2020”;
关键词
Feature Store; MLOps; RonDB; Arrow Flight; DuckDB;
D O I
10.1145/3626246.3653389
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data management is the most challenging aspect of building Machine Learning (ML) systems. ML systems can read large volumes of historical data when training models, but inference workloads are more varied, depending on whether it is a batch or online ML system. The feature store for ML has recently emerged as a single data platform for managing ML data throughout the ML lifecycle, from feature engineering to model training to inference. In this paper, we present the Hopsworks feature store for machine learning as a highly available platform for managing feature data with API support for columnar, row-oriented, and similarity search query workloads. We introduce and address challenges solved by the feature stores related to feature reuse, how to organize data transformations, and how to ensure correct and consistent data between feature engineering, model training, and model inference. We present the engineering challenges in building high-performance query services for a feature store and show how Hopsworks outperforms existing cloud feature stores for training and online inference query workloads.
引用
收藏
页码:135 / 147
页数:13
相关论文
共 50 条
  • [41] A Machine Learning Model to Classify the Feature Model Maintainability
    Silva, Publio
    Bezerra, Carla I. M.
    Machado, Ivan
    SPLC '21: PROCEEDINGS OF THE 25TH ACM INTERNATIONAL SYSTEMS AND SOFTWARE PRODUCT LINE CONFERENCE, VOL A, 2021,
  • [42] From Feature to Paradigm: Deep Learning in Machine Translation
    Costa-Jussa, Marta R.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 947 - 974
  • [43] Suggestion of statistical validation on feature importance of machine learning
    Lee, Youngro
    Seo, Jongmo
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [44] A Survey: Network Feature Measurement Based on Machine Learning
    Sun, Muyi
    He, Bingyu
    Li, Ran
    Li, Jinhua
    Zhang, Xinchang
    APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [45] INTRUSION DETECTION BASED ON MACHINE LEARNING AND FEATURE SELECTION
    Alaoui, Souad
    El Gonnouni, Amina
    Lyhyaoui, Abdelouahid
    MENDEL 2011 - 17TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING, 2011, : 199 - 206
  • [46] Feature selection in a machine learning system for texture classification
    Baik, SW
    Bala, J
    ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY V, 1998, 3370 : 261 - 268
  • [47] Atomtransmachine: An atomic feature representation model for machine learning
    Hu, Mengxian
    Yuan, Jianmei
    Sun, Tao
    Huang, Meng
    Liang, Qingyun
    Computational Materials Science, 2021, 200
  • [48] Matrixized Learning Machine with Feature-Clustering Interpolation
    Yujin Zhu
    Zhe Wang
    Daqi Gao
    Neural Processing Letters, 2016, 44 : 291 - 306
  • [49] Importance of feature construction in machine learning for phase transitions
    Jang, Inhyuk
    Kaur, Supreet
    Yethiraj, Arun
    JOURNAL OF CHEMICAL PHYSICS, 2022, 157 (09):
  • [50] Converter Circuits to Machine Learning: Optimal Feature Selection
    Khamis, Ahmed K.
    Agamy, Mohammed
    2022 IEEE ENERGY CONVERSION CONGRESS AND EXPOSITION (ECCE), 2022,