Data pricing in machine learning pipelines

被引：0

作者：

Zicun Cong

Xuan Luo

Jian Pei

Feida Zhu

Yong Zhang

机构：

[1] Simon Fraser University,

[2] Singapore Management University,undefined

[3] Huawei Technologies Canada,undefined

来源：

Knowledge and Information Systems | 2022年 / 64卷

关键词：

Data assets; Data pricing; Data products; Machine learning; AI;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Machine learning is disruptive. At the same time, machine learning can only succeed by collaboration among many parties in multiple steps naturally as pipelines in an eco-system, such as collecting data for possible machine learning applications, collaboratively training models by multiple parties and delivering machine learning services to end users. Data are critical and penetrating in the whole machine learning pipelines. As machine learning pipelines involve many parties and, in order to be successful, have to form a constructive and dynamic eco-system, marketplaces and data pricing are fundamental in connecting and facilitating those many parties. In this article, we survey the principles and the latest research development of data pricing in machine learning pipelines. We start with a brief review of data marketplaces and pricing desiderata. Then, we focus on pricing in three important steps in machine learning pipelines. To understand pricing in the step of training data collection, we review pricing raw data sets and data labels. We also investigate pricing in the step of collaborative training of machine learning models and overview pricing machine learning models for end users in the step of machine learning deployment. We also discuss a series of possible future directions.

引用

页码：1417 / 1455

页数：38

共 50 条

[1] Data pricing in machine learning pipelines
Cong, Zicun
Luo, Xuan
Pei, Jian
Zhu, Feida
Zhang, Yong
KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (06) : 1417 - 1455
[2] Data distribution debugging in machine learning pipelines
Grafberger, Stefan
Groth, Paul
Stoyanovich, Julia
Schelter, Sebastian
VLDB JOURNAL, 2022, 31 (05): : 1103 - 1126
[3] Data distribution debugging in machine learning pipelines
Stefan Grafberger
Paul Groth
Julia Stoyanovich
Sebastian Schelter
The VLDB Journal, 2022, 31 : 1103 - 1126
[4] Optimizing Data Pipelines for Machine Learning in Feature Stores
Liu, Rui
Park, Kwanghyun
Psallidas, Fotis
Zhu, Xiaoyong
Mo, Jinghui
Sen, Rathijit
Interlandi, Matteo
Karanasos, Konstantinos
Tian, Yuanyuan
Camacho-Rodriguez, Jesus
PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (13): : 4230 - 4239
[5] MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines
Grafberger, Stefan
Guha, Shubha
Stoyanovich, Julia
Schelter, Sebastian
SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2736 - 2739
[6] Disdat: Bundle Data Management for Machine Learning Pipelines
Yocum, Ken
Rowan, Sean
Lunt, Jonathan
Wong, Theodore M.
PROCEEDINGS OF THE 2019 USENIX CONFERENCE ON OPERATIONAL MACHINE LEARNING, 2019, : 35 - 37
[7] A Machine Learning Approach for Big Data in Oil and Gas Pipelines
Mohamed, Abduljalil
Hamdi, Mohamed Salah
Tahar, Sofiene
2015 3RD INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD) AND INTERNATIONAL CONFERENCE ON OPEN AND BIG (OBD), 2015, : 585 - 590
[8] Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines
Dong, Sijie
Wang, Qitong
Sahri, Soror
Palpanas, Themis
Srivastava, Divesh
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11): : 3072 - 3081
[9] cedar: Optimized and Unified Machine Learning Input Data Pipelines
Zhao, Mark
Adamiak, Emanuel
Kozyrakis, Christos
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 18 (02): : 488 - 502
[10] On the Democratization of Machine Learning Pipelines
Carqueja, Alexandre
Cabral, Bruno
Fernandes, Joao Paulo
Lourenco, Nuno
2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 455 - 462

← 1 2 3 4 5 →