GitWorkflow for Active Learning: A Development Methodology Proposal for Data-Centric AI Projects

被引:0
|
作者
Stieler, Fabian [1 ]
Bauer, Bernhard [1 ]
机构
[1] Univ Augsburg, Inst Comp Sci, Augsburg, Germany
关键词
Active Learning; Software Engineering for Machine Learning; Machine Learning Operations;
D O I
10.5220/0011988400003464
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As soon as Artificial Intelligence (AI) projects grow from small feasibility studies to mature projects, developers and data scientists face new challenges, such as collaboration with other developers, versioning data, or traceability of model metrics and other resulting artifacts. This paper suggests a data-centric AI project with an Active Learning (AL) loop from a developer perspective and presents "Git Workflow for AL": A methodology proposal to guide teams on how to structure a project and solve implementation challenges. We introduce principles for data, code, as well as automation, and present a new branching workflow. The evaluation shows that the proposed method is an enabler for fulfilling established best practices.
引用
收藏
页码:202 / 213
页数:12
相关论文
共 50 条
  • [31] Taxonomy of machine learning paradigms: A data-centric perspective
    Emmert-Streib, Frank
    Dehmer, Matthias
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 12 (05)
  • [32] Analyzing Data-Centric Properties for Graph Contrastive Learning
    Trivedi, Puja
    Lubana, Ekdeep Singh
    Heimann, Mark
    Koutra, Danai
    Thiagarajan, Jayaraman J.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [33] D-AI2-M: Ethanol Production Forecasting in Brazil Using Data-Centric Artificial Intelligence Methodology
    Mello, Antonio
    Giusti, Lucas
    Tavares, Tarsila
    Alexandrino, Fernando
    Guedes, Gustavo
    Soares, Jorge
    Barbastefano, Rafael
    Porto, Fabio
    Carvalho, Diego
    Ogasawara, Eduardo
    [J]. IEEE Latin America Transactions, 2024, 22 (11): : 899 - 910
  • [34] wProjects: Data-centric Web Development for Female Nonprogrammers
    Harshbarger, Nicole L.
    Rosson, Mary Beth
    [J]. 2012 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC), 2012, : 67 - 70
  • [35] IoT in the Fog: A Roadmap for Data-Centric IoT Development
    Oteafy, Sharief M. A.
    Hassanein, Hossam S.
    [J]. IEEE COMMUNICATIONS MAGAZINE, 2018, 56 (03) : 157 - 163
  • [36] ydata-profiling: Accelerating data-centric AI with high-quality data
    Clemente, Fabiana
    Ribeiro, Goncalo Martins
    Quemy, Alexandre
    Santos, Miriam Seoane
    Pereira, Ricardo Cardoso
    Barros, Alex
    [J]. NEUROCOMPUTING, 2023, 554
  • [37] A Reverse Data-Centric Process Design Methodology for Public Administration Processes
    Kiss, Peter Jozsef
    Klimko, Gabor
    [J]. ELECTRONIC GOVERNMENT AND THE INFORMATION SYSTEMS PERSPECTIVE, EGOVIS 2019, 2019, 11709 : 85 - 99
  • [38] Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark
    Hansen, Lasse
    Seedat, Nabeel
    van der Schaar, Mihaela
    Petrovic, Andrija
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [39] A data-centric review of deep transfer learning with applications to text data
    Bashath, Samar
    Perera, Nadeesha
    Tripathi, Shailesh
    Manjang, Kalifa
    Dehmer, Matthias
    Streib, Frank Emmert
    [J]. INFORMATION SCIENCES, 2022, 585 : 498 - 528
  • [40] Proposal of Data-Centric Network for Mobile and Dynamic Machine-to-Machine Communication
    Matsubara, Daisuke
    Yabusaki, Hitoshi
    Okamoto, Satoru
    Yamanaka, Naoaki
    Takahashi, Tatsuro
    [J]. IEICE TRANSACTIONS ON COMMUNICATIONS, 2013, E96B (11) : 2795 - 2806