An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems

被引:24
|
作者
Tang, Yiming [1 ]
Khatchadourian, Raffi [1 ,2 ]
Bagherzadeh, Mehdi [3 ]
Singh, Rhia [4 ]
Stewart, Ajani [2 ]
Raja, Anita [1 ,2 ]
机构
[1] CUNY, Grad Ctr, New York, NY 10036 USA
[2] CUNY Hunter Coll, New York, NY 10021 USA
[3] Oakland Univ, Rochester, MI 48063 USA
[4] CUNY, Macaulay Honors Coll, New York, NY USA
关键词
empirical studies; refactoring; machine learning; systems; technical debt; software repository mining; SOFTWARE;
D O I
10.1109/ICSE43902.2021.00033
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today's data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major crosscutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.
引用
收藏
页码:238 / 250
页数:13
相关论文
共 50 条
  • [41] Ontology for Technical Debt in Systems Engineering
    Kleinwaks, Howard
    Batchelor, Ann
    Bradley, Thomas H.
    IEEE Open Journal of Systems Engineering, 2023, 1 : 111 - 122
  • [42] Technical Debt in Automated Production Systems
    Vogel-Heuser, Birgit
    Roesch, Susanne
    Martini, Antonio
    Tichy, Matthias
    2015 IEEE 7TH INTERNATIONAL WORKSHOP ON MANAGING TECHNICAL DEBT (MTD) PROCEEDINGS, 2015, : 49 - 52
  • [43] An Empirical Study of Testing Machine Learning in the Wild
    Openja, Moses
    Khomh, Foutse
    Foundjem, Armstrong
    Jiang, Zhen Mings
    Abidi, Mouna
    Hassan, Ahmed E.
    ACM Transactions on Software Engineering and Methodology, 2024, 34 (01)
  • [44] Technical Debt in Hardware Systems and Elements
    Rosser, Larri Ann
    Ouzzif, Zakaria
    2021 IEEE AEROSPACE CONFERENCE (AEROCONF 2021), 2021,
  • [45] Using Machine Learning to Guide the Application of Software Refactorings: A Preliminary Exploration
    Nikolaidis, Nikolaos
    Zisis, Dimitrios
    Ampatzoglou, Apostolos
    Mittas, Nikolaos
    Chatzigeorgiou, Alexander
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING TECHNIQUES FOR SOFTWARE QUALITY EVALUATION, MALTESQUE 2022, 2022, : 23 - 28
  • [46] DebtHunter: A Machine Learning-based Approach for Detecting Self-Admitted Technical Debt
    Sala, Irene
    Tommasel, Antonela
    Fontana, Francesca Arcelli
    PROCEEDINGS OF EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING (EASE 2021), 2021, : 278 - 283
  • [47] The Perception of Technical Debt in the Embedded Systems Domain: An Industrial Case Study
    Ampatzoglou, Areti
    Ampatzoglou, Apostolos
    Chatzigeorgiou, Alexander
    Avgeriou, Paris
    Abrahamsson, Pekka
    Martini, Antonio
    Zdun, Uwe
    Systa, Kari
    2016 IEEE 8TH INTERNATIONAL WORKSHOP ON MANAGING TECHNICAL DEBT (MTD), 2016, : 9 - 16
  • [48] Technical debt as an indicator of software security risk: a machine learning approach for software development enterprises
    Siavvas, Miltiadis
    Tsoukalas, Dimitrios
    Jankovic, Marija
    Kehagias, Dionysios
    Tzovaras, Dimitrios
    ENTERPRISE INFORMATION SYSTEMS, 2022, 16 (05)
  • [49] Towards Collaborative Technical Debt Management in Systems of Systems
    Schuetz, Johann
    Gomez, Jorge Marx
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON TECHNICAL DEBT, TECHDEBT, 2020, : 87 - 91
  • [50] An empirical study of the impact of OCL smells and refactorings on the understandability of OCL specifications
    Correa, Alexandre
    Werner, Claudia
    Barros, Marcio
    MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, PROCEEDINGS, 2007, 4735 : 76 - +