GitWorkflow for Active Learning: A Development Methodology Proposal for Data-Centric AI Projects

被引:0
|
作者
Stieler, Fabian [1 ]
Bauer, Bernhard [1 ]
机构
[1] Univ Augsburg, Inst Comp Sci, Augsburg, Germany
关键词
Active Learning; Software Engineering for Machine Learning; Machine Learning Operations;
D O I
10.5220/0011988400003464
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As soon as Artificial Intelligence (AI) projects grow from small feasibility studies to mature projects, developers and data scientists face new challenges, such as collaboration with other developers, versioning data, or traceability of model metrics and other resulting artifacts. This paper suggests a data-centric AI project with an Active Learning (AL) loop from a developer perspective and presents "Git Workflow for AL": A methodology proposal to guide teams on how to structure a project and solve implementation challenges. We introduce principles for data, code, as well as automation, and present a new branching workflow. The evaluation shows that the proposed method is an enabler for fulfilling established best practices.
引用
收藏
页码:202 / 213
页数:12
相关论文
共 50 条
  • [1] Data-Centric AI
    Malerba, Donato
    Pasquadibisceglie, Vincenzo
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024,
  • [2] DataPerf: Benchmarks for Data-Centric AI Development
    Mazumder, Mark
    Banbury, Colby
    Yao, Xiaozhe
    Karlas, Bojan
    Rojas, William Gaviria
    Diamos, Sudnya
    Diamos, Greg
    He, Lynn
    Parrish, Alicia
    Kirk, Hannah Rose
    Quaye, Jessica
    Rastogi, Charvi
    Kiela, Douwe
    Jurado, David
    Kanter, David
    Mosquera, Rafael
    Ciro, Juan
    Aroyo, Lora
    Acun, Bilge
    Chen, Lingjiao
    Raje, Mehul Smriti
    Bartolo, Max
    Eyuboglu, Sabri
    Ghorbani, Amirata
    Goodman, Emmett
    Inel, Oana
    Kane, Tariq
    Kirkpatrick, Christine R.
    Kuo, Tzu-Sheng
    Mueller, Jonas
    Thrush, Tristan
    Vanschoren, Joaquin
    Warren, Margaret
    Williams, Adina
    Yeung, Serena
    Ardalani, Newsha
    Paritosh, Praveen
    Zhang, Ce
    Zou, James
    Wu, Carole-Jean
    Coleman, Cody
    Ng, Andrew
    Mattson, Peter
    Reddi, Vijay Janapa
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] The Principles of Data-Centric AI
    Jarrahi, Mohammad Hossein
    Memariani, Ali
    Guha, Shion
    [J]. COMMUNICATIONS OF THE ACM, 2023, 66 (08) : 84 - 92
  • [4] Making data-centric projects a reality
    Bailie, Bruce
    Talakar, Arja
    [J]. Hydrocarbon Engineering, 2022, 27 (01): : 57 - 60
  • [5] Data-centric AI: Perspectives and Challenges
    Zha, Daochen
    Bhat, Zaid Pervaiz
    Lai, Kwei-Herng
    Yang, Fan
    Hu, Xia
    [J]. PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 945 - 948
  • [6] Opportunities and Challenges in Data-Centric AI
    Kumar, Sushant
    Datta, Sumit
    Singh, Vishakha
    Singh, Sanjay Kumar
    Sharma, Ritesh
    [J]. IEEE ACCESS, 2024, 12 : 33173 - 33189
  • [7] Data collection and quality challenges in deep learning: a data-centric AI perspective
    Steven Euijong Whang
    Yuji Roh
    Hwanjun Song
    Jae-Gil Lee
    [J]. The VLDB Journal, 2023, 32 : 791 - 813
  • [8] Data collection and quality challenges in deep learning: a data-centric AI perspective
    Whang, Steven Euijong
    Roh, Yuji
    Song, Hwanjun
    Lee, Jae-Gil
    [J]. VLDB JOURNAL, 2023, 32 (04): : 791 - 813
  • [9] From Concept to Implementation: The Data-Centric Development Process for AI in Industry
    Luley, Paul-Philipp
    Deriu, Jan M.
    Yan, Peng
    Schatte, Gerrit A.
    Stadelmann, Thilo
    [J]. 2023 10TH IEEE SWISS CONFERENCE ON DATA SCIENCE, SDS, 2023, : 73 - 76
  • [10] dcbench: A Benchmark for Data-Centric AI Systems
    Eyuboglu, Sabri
    Karlas, Bojan
    Re, Christopher
    Zhang, Ce
    Zou, James
    [J]. PROCEEDINGS OF THE 6TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2022, 2022,