The Hopsworks Feature Store for Machine Learning

被引:0
|
作者
Martinez, Javier de la Rua [1 ,2 ]
Buso, Fabio [1 ]
Kouzoupis, Antonios [1 ]
Ormenisan, Alexandru A. [1 ]
Niazi, Salman [1 ]
Bzhalava, Davit [1 ]
Mak, Kenneth [1 ]
Jouffrey, Victor [1 ]
Ronstrom, Mikael [1 ]
Cunningham, Raymond [1 ]
Zangis, Ralfs [1 ]
Mukhedkar, Dhananjay [1 ]
Khazanchi, Ayushman [2 ]
Vlassov, Vladimir [2 ]
Dowling, Jim [1 ,2 ]
机构
[1] Hopsworks AB, Stockholm, Sweden
[2] KTH Royal Inst Technol, Stockholm, Sweden
基金
欧盟地平线“2020”;
关键词
Feature Store; MLOps; RonDB; Arrow Flight; DuckDB;
D O I
10.1145/3626246.3653389
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data management is the most challenging aspect of building Machine Learning (ML) systems. ML systems can read large volumes of historical data when training models, but inference workloads are more varied, depending on whether it is a batch or online ML system. The feature store for ML has recently emerged as a single data platform for managing ML data throughout the ML lifecycle, from feature engineering to model training to inference. In this paper, we present the Hopsworks feature store for machine learning as a highly available platform for managing feature data with API support for columnar, row-oriented, and similarity search query workloads. We introduce and address challenges solved by the feature stores related to feature reuse, how to organize data transformations, and how to ensure correct and consistent data between feature engineering, model training, and model inference. We present the engineering challenges in building high-performance query services for a feature store and show how Hopsworks outperforms existing cloud feature stores for training and online inference query workloads.
引用
收藏
页码:135 / 147
页数:13
相关论文
共 50 条
  • [11] A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning
    Khalid, Samina
    Khalil, Tehmina
    Nasreen, Shamila
    2014 SCIENCE AND INFORMATION CONFERENCE (SAI), 2014, : 372 - 378
  • [12] Applying Machine Learning of Erythrocytes Dynamic Antigens Store in Medicine
    Rafea, Mahmoud
    Elkafrawy, Passant
    Nasef, Mohammed M.
    Elnemr, Rasha
    Jamal, Amani Tariq
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2019, 6
  • [13] Clustering in extreme learning machine feature space
    He, Qing
    Jin, Xin
    Du, Changying
    Zhuang, Fuzhen
    Shi, Zhongzhi
    NEUROCOMPUTING, 2014, 128 : 88 - 95
  • [14] Nonlinear optical feature generator for machine learning
    Yildirim, Mustafa
    Oguz, Ilker
    Kaufmann, Fabian
    Escale, Marc Reig
    Grange, Rachel
    Psaltis, Demetri
    Moser, Christophe
    APL PHOTONICS, 2023, 8 (10)
  • [15] Autonomous Cell Feature Selection by Machine Learning
    Tiryaki, V.
    Ayres, V.
    Ahmed, I.
    Shreiber, D.
    MOLECULAR BIOLOGY OF THE CELL, 2023, 34 (02) : 31 - 32
  • [16] Special feature: computational statistics and machine learning
    Yadohisa, Hiroshi
    Sakamoto, Wataru
    JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2019, 2 (01) : 219 - 220
  • [17] Machine Learning for sEMG Facial Feature Characterization
    Kelati, Amleset
    Plosila, Juha
    Tenhunen, Hannu
    2019 SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA 2019), 2019, : 169 - 174
  • [18] Explainable machine learning for phishing feature detection
    Calzarossa, Maria Carla
    Giudici, Paolo
    Zieni, Rasha
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2024, 40 (01) : 362 - 373
  • [19] Feature Selection of Photoplethysmograph Data in Machine Learning
    Haq, Faris Atoil
    Sarno, Riyanarto
    Abdillah, Rifqi
    Amri, Taufiq Choirul
    Septiyanto, Abdullah Faqih
    Sungkono, Kelly Rossa
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 315 - 320
  • [20] Quantum Machine Learning in Feature Hilbert Spaces
    Schuld, Maria
    Killoran, Nathan
    PHYSICAL REVIEW LETTERS, 2019, 122 (04)