Understanding Software-2.0: A Study of Machine Learning Library Usage and Evolution

被引:39
|
作者
Dilhara, Malinda [2 ]
Ketkar, Ameya [1 ]
Dig, Danny [2 ]
机构
[1] Oregon State Univ, Corvallis, OR 97333 USA
[2] Univ Colorado, Boulder, CO 80301 USA
关键词
Machine learning libraries; empirial studies; Software-2.0; SUPPORT; RECOMMENDATION;
D O I
10.1145/3453478
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Enabled by a rich ecosystem of Machine Learning (ML) libraries, programming using learned models, i.e., Software-2.0, has gained substantial adoption. However, we do not know what challenges developers encounter when they use ML libraries. With this knowledge gap, researchers miss opportunities to contribute to new research directions, tool builders do not invest resources where automation is most needed, library designers cannot make informed decisions when releasing ML library versions, and developers fail to use common practices when using ML libraries. We present the first large-scale quantitative and qualitative empirical study to shed light on how developers in Software-2.0 use ML libraries, and how this evolution affects their code. Particularly, using static analysis we perform a longitudinal study of 3,340 top-rated open-source projects with 46,110 contributors. To further understand the challenges of ML library evolution, we survey 109 developers who introduce and evolve ML libraries. Using this rich dataset we reveal several novel findings. Among others, we found an increasing trend of using ML libraries: The ratio of new Python projects that use ML libraries increased from 2% in 2013 to 50% in 2018. We identify several usage patterns including the following: (i) 36% of the projects use multiple ML libraries to implement various stages of the ML workflows, (ii) developers update ML libraries more often than the traditional libraries, (iii) strict upgrades are the most popular for ML libraries among other update kinds, (iv) ML library updates often result in cascading library updates, and (v) ML libraries are often downgraded (22.04% of cases). We also observed unique challenges when evolving and maintaining Software-2.0 such as (i) binary incompatibility of trained ML models and (ii) benchmarking ML models. Finally, we present actionable implications of our findings for researchers, tool builders, developers, educators, library vendors, and hardware vendors.
引用
收藏
页数:42
相关论文
共 50 条
  • [1] Understanding machine learning software defect predictions
    Geanderson Esteves
    Eduardo Figueiredo
    Adriano Veloso
    Markos Viggiato
    Nivio Ziviani
    [J]. Automated Software Engineering, 2020, 27 : 369 - 392
  • [2] Understanding machine learning software defect predictions
    Esteves, Geanderson
    Figueiredo, Eduardo
    Veloso, Adriano
    Viggiato, Markos
    Ziviani, Nivio
    [J]. AUTOMATED SOFTWARE ENGINEERING, 2020, 27 (3-4) : 369 - 392
  • [3] Indonesian LIS Professionals' Understanding of Library 2.0: A Pilot Study
    Mulatiningsih, Bekti
    Johnson, Kelly
    [J]. JOURNAL OF WEB LIBRARIANSHIP, 2014, 8 (03) : 286 - 304
  • [4] mechanoChemML: A software library for machine learning in computational materials physics
    Zhang, X.
    Teichert, G. H.
    Wang, Z.
    Duschenes, M.
    Srivastava, S.
    Livingston, E.
    Holber, J.
    Faghih Shojaei, M.
    Sundararajan, A.
    Garikipati, K.
    [J]. COMPUTATIONAL MATERIALS SCIENCE, 2022, 211
  • [5] An Exploratory Study on Library Aging by Monitoring Client Usage in a Software Ecosystem
    Kula, Raula Gaikovina
    German, Daniel M.
    Ishio, Takashi
    Ouni, Ali
    Inoue, Katsuro
    [J]. 2017 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), 2017, : 407 - 411
  • [6] Characterizing and Understanding Software Security Vulnerabilities in Machine Learning Libraries
    Harzevili, Nima Shiri
    Shin, Jiho
    Wang, Junjie
    Wang, Song
    Nagappan, Nachiappan
    [J]. 2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 27 - 38
  • [7] An experimental study of software engineering learning using IDE 2.0
    Itahriouan, Zakaria
    Aknin, Noura
    Abtoy, Anouar
    El Kadiri, Kamal Eddine
    [J]. 2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 559 - 563
  • [8] An Empirical Study on the Usage of Automated Machine Learning Tools
    Majidi, Forough
    Openja, Moses
    Khomh, Foutse
    Li, Heng
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 59 - 70
  • [9] Machine Learning for Software Engineering: A Tertiary Study
    Kotti, Zoe
    Galanopoulou, Rafaila
    Spinellis, Diomidis
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (12)
  • [10] Software Engineering for Machine Learning: A Case Study
    Amershi, Saleema
    Begel, Andrew
    Bird, Christian
    DeLine, Robert
    Gall, Harald
    Kamar, Ece
    Nagappan, Nachiappan
    Nushi, Besmira
    Zimmermann, Thomas
    [J]. 2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2019), 2019, : 291 - 300