Systematic review of data-centric approaches in artificial intelligence and machine learning

被引:0
|
作者
Singh P. [1 ]
机构
[1] Wellington, New Zealand
来源
Data Science and Management | 2023年 / 6卷 / 03期
基金
美国国家科学基金会; 英国惠康基金; 欧洲研究理事会; 美国国家卫生研究院;
关键词
Data management; Data preprocessing; Data-centric; Machine learning; MLOps; Semi-supervised learning; Technical debt;
D O I
10.1016/j.dsm.2023.06.001
中图分类号
学科分类号
摘要
Artificial intelligence (AI) relies on data and algorithms. State-of-the-art (SOTA) AI smart algorithms have been developed to improve the performance of AI-oriented structures. However, model-centric approaches are limited by the absence of high-quality data. Data-centric AI is an emerging approach for solving machine learning (ML) problems. It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline. However, data-centric AI approaches are not well documented. Researchers have conducted various experiments without a clear set of guidelines. This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems. These include big data quality assessment, data preprocessing, transfer learning, semi-supervised learning, machine ​learning ​operations (MLOps), and the effect of adding more data. In addition, it highlights recent data-centric techniques adopted by ML practitioners. We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them. Finally, we discuss the causes of technical debt in AI. Technical debt builds up when software design and implementation decisions run into “or outright collide with” business goals and timelines. This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches. © 2023 Xi'an Jiaotong University
引用
收藏
页码:144 / 157
页数:13
相关论文
共 50 条
  • [1] Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer
    Adeoye, John
    Hui, Liuling
    Su, Yu-Xiong
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [2] Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer
    John Adeoye
    Liuling Hui
    Yu-Xiong Su
    [J]. Journal of Big Data, 10
  • [3] Data-Centric Artificial Intelligence
    Jakubik, Johannes
    Voessing, Michael
    Kuehl, Niklas
    Walk, Jannis
    Satzger, Gerhard
    [J]. BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2024, 66 (04) : 507 - 515
  • [4] Data-Centric Approaches to Radio Frequency Machine Learning
    Kuzdeba, Scott
    Robinson, Josh
    [J]. 2022 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2022,
  • [5] Data-Centric Green Artificial Intelligence: A Survey
    Salehi S.
    Schmeink A.
    [J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (05): : 1 - 18
  • [6] Technical Analysis of Data-Centric and Model-Centric Artificial Intelligence
    Majeed, Abdul
    Hwang, Seong Oun
    [J]. IT PROFESSIONAL, 2023, 25 (06) : 62 - 70
  • [7] Machine learning for data-centric epidemic forecasting
    Rodriguez, Alexander
    Kamarthi, Harshavardhan
    Agarwal, Pulak
    Ho, Javen
    Patel, Mira
    Sapre, Suchet
    Prakash, B. Aditya
    [J]. NATURE MACHINE INTELLIGENCE, 2024, : 1122 - 1131
  • [8] A Data-Centric Optimization Framework for Machine Learning
    Rausch, Oliver
    Ben-Nun, Tal
    Dryden, Nikoli
    Ivanov, Andrei
    Li, Shigang
    Hoefler, Torsten
    [J]. PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,
  • [9] Data-Centric Artificial Intelligence, Preprocessing, and the Quest for Transformative Artificial Intelligence Systems Development
    Majeed, Abdul
    Hwang, Seong Oun
    [J]. COMPUTER, 2023, 56 (05) : 109 - 115
  • [10] Applications of artificial intelligence/machine learning approaches in cardiovascular medicine: a systematic review with recommendations
    Friedrich, Sarah
    Gross, Stefan
    Koenig, Inke R.
    Engelhardt, Sandy
    Bahls, Martin
    Heinz, Judith
    Huber, Cynthia
    Kaderali, Lars
    Kelm, Marcus
    Leha, Andreas
    Ruehl, Jasmin
    Schaller, Jens
    Scherer, Clemens
    Vollmer, Marcus
    Seidler, Tim
    Friede, Tim
    [J]. EUROPEAN HEART JOURNAL - DIGITAL HEALTH, 2021, 2 (03): : 424 - 436