An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems

被引:24
|
作者
Tang, Yiming [1 ]
Khatchadourian, Raffi [1 ,2 ]
Bagherzadeh, Mehdi [3 ]
Singh, Rhia [4 ]
Stewart, Ajani [2 ]
Raja, Anita [1 ,2 ]
机构
[1] CUNY, Grad Ctr, New York, NY 10036 USA
[2] CUNY Hunter Coll, New York, NY 10021 USA
[3] Oakland Univ, Rochester, MI 48063 USA
[4] CUNY, Macaulay Honors Coll, New York, NY USA
关键词
empirical studies; refactoring; machine learning; systems; technical debt; software repository mining; SOFTWARE;
D O I
10.1109/ICSE43902.2021.00033
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today's data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major crosscutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.
引用
收藏
页码:238 / 250
页数:13
相关论文
共 50 条
  • [1] Refactorings and Technical Debt in Docker Projects: An Empirical Study
    Ksontini, Emna
    Kessentini, Marouane
    Ferreira, Thiago do N.
    Hassan, Foyzul
    [J]. 2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 781 - 791
  • [2] Hidden Technical Debt in Machine Learning Systems
    Sculley, D.
    Holt, Gary
    Golovin, Daniel
    Davydov, Eugene
    Phillips, Todd
    Ebner, Dietmar
    Chaudhary, Vinay
    Young, Michael
    Crespo, Jean-Francois
    Dennison, Dan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [3] Empirical Analysis of Hidden Technical Debt Patterns in Machine Learning Software
    Alahdab, Mohannad
    Calikli, Gul
    [J]. PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2019, 2019, 11915 : 195 - 202
  • [4] 23 Shades of Self-Admitted Technical Debt: An Empirical Study on Machine Learning Software
    OBrien, David
    Biswas, Sumon
    Imtiaz, Sayem
    Abdalkareem, Rabe
    Shihab, Emad
    Rajan, Hridesh
    [J]. PROCEEDINGS OF THE 30TH ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2022, 2022, : 734 - 746
  • [5] Machine Learning for Technical Debt Identification
    Tsoukalas, Dimitrios
    Mittas, Nikolaos
    Chatzigeorgiou, Alexander
    Kehagias, Dionysios
    Ampatzoglou, Apostolos
    Amanatidis, Theodoros
    Angelis, Lefteris
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (12) : 4892 - 4906
  • [6] Machine Learning for Technical Debt Identification
    Tsoukalas, Dimitrios
    Mittas, Nikolaos
    Chatzigeorgiou, Alexander
    Kehagias, Dionysios
    Ampatzoglou, Apostolos
    Amanatidis, Theodoros
    Angelis, Lefteris
    [J]. IEEE Transactions on Software Engineering, 2022, 48 (12): : 4892 - 4906
  • [7] Forecasting technical debt evolution in software systems: an empirical study
    Aversano, Lerina
    Bernardi, Mario Luca
    Cimitile, Marta
    Iammarino, Martina
    Montano, Debora
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (03)
  • [8] Forecasting technical debt evolution in software systems:an empirical study
    Lerina AVERSANO
    Mario Luca BERNARDI
    Marta CIMITILE
    Martina IAMMARINO
    Debora MONTANO
    [J]. Frontiers of Computer Science, 2023, 17 (03) : 68 - 80
  • [9] Machine Learning for Software Technical Debt Detection
    Kachanov, V. V.
    Markov, S. I.
    Tsurkov, V. I.
    [J]. JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 2023, 62 (04) : 689 - 694
  • [10] Machine Learning for Software Technical Debt Detection
    V. V. Kachanov
    S. I. Markov
    V. I. Tsurkov
    [J]. Journal of Computer and Systems Sciences International, 2023, 62 : 689 - 694