On using machine learning to automatically classify software applications into domain categories

被引:0
|
作者
Mario Linares-Vásquez
Collin McMillan
Denys Poshyvanyk
Mark Grechanik
机构
[1] The College of William and Mary,
[2] Universitry of Notre Dame,undefined
[3] University of Illinois at Chicago,undefined
来源
关键词
Closed-source; Open-source; Software categorization; Machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
Software repositories hold applications that are often categorized to improve the effectiveness of various maintenance tasks. Properly categorized applications allow stakeholders to identify requirements related to their applications and predict maintenance problems in software projects. Manual categorization is expensive, tedious, and laborious – this is why automatic categorization approaches are gaining widespread importance. Unfortunately, for different legal and organizational reasons, the applications’ source code is often not available, thus making it difficult to automatically categorize these applications. In this paper, we propose a novel approach in which we use Application Programming Interface (API) calls from third-party libraries for automatic categorization of software applications that use these API calls. Our approach is general since it enables different categorization algorithms to be applied to repositories that contain both source code and bytecode of applications, since API calls can be extracted from both the source code and byte-code. We compare our approach to a state-of-the-art approach that uses machine learning algorithms for software categorization, and conduct experiments on two large Java repositories: an open-source repository containing 3,286 projects and a closed-source repository with 745 applications, where the source code was not available. Our contribution is twofold: we propose a new approach that makes it possible to categorize software projects without any source code using a small number of API calls as attributes, and furthermore we carried out a comprehensive empirical evaluation of automatic categorization approaches.
引用
收藏
页码:582 / 618
页数:36
相关论文
共 50 条
  • [21] Extreme Learning Machine for Multi-Categories Classification Applications
    Rong, Hai-Jun
    Huang, Guang-Bin
    Ong, Yew-Soon
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1709 - +
  • [22] DLMDish: Using Applied Deep Learning and Computer Vision to Automatically Classify Mauritian Dishes
    Toofanee, Mohammud Shaad Ally
    Boudraa, Omar
    Tamine, Karim
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023,
  • [23] Using a Machine Learning Approach to Classify the Degree of Forest Management
    Floren, Andreas
    Mueller, Tobias
    [J]. SUSTAINABILITY, 2023, 15 (16)
  • [24] Using Machine Learning Technologies to Classify and Predict Heart Disease
    Alrifaie, Mohammed F.
    Ahmed, Zakir Hussain
    Hameed, Asaad Shakir
    Mutar, Modhi Lafta
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 123 - 127
  • [25] Credit Scoring to Classify Consumer Loan Using Machine Learning
    Natasha, Azaria
    Prastyo, Dedy Dwi
    Suhartono
    [J]. 2ND INTERNATIONAL CONFERENCE ON SCIENCE, MATHEMATICS, ENVIRONMENT, AND EDUCATION, 2019, 2019, 2194
  • [26] Using Machine Learning to Molecularly Classify Systemic Sclerosis Patients
    Tao, Weiyang
    Radstake, Timothy R. D. J.
    Pandit, Aridaman
    [J]. ARTHRITIS & RHEUMATOLOGY, 2019, 71 (10) : 1595 - 1598
  • [27] Software Troubleshooting using Machine Learning
    Kalibhat, Neha M.
    Varshini, Shreya
    Sitaram, Dinkar
    Kalambur, Subramaniam
    Kollengode, Chid
    [J]. 2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING WORKSHOPS (HIPCW), 2017, : 3 - 10
  • [28] Machine learning and EEG can classify passive viewing of discrete categories of visual stimuli but not the observation of pain
    Tyler Mari
    Jessica Henderson
    S. Hasan Ali
    Danielle Hewitt
    Christopher Brown
    Andrej Stancak
    Nicholas Fallon
    [J]. BMC Neuroscience, 24
  • [29] Machine learning and EEG can classify passive viewing of discrete categories of visual stimuli but not the observation of pain
    Mari, Tyler
    Henderson, Jessica
    Ali, S. Hasan
    Hewitt, Danielle
    Brown, Christopher
    Stancak, Andrej
    Fallon, Nicholas
    [J]. BMC NEUROSCIENCE, 2023, 24 (01)
  • [30] Machine learning to classify animal species in camera trap images: Applications in ecology
    Tabak, Michael A.
    Norouzzadeh, Mohammad S.
    Wolfson, David W.
    Sweeney, Steven J.
    Vercauteren, Kurt C.
    Snow, Nathan P.
    Halseth, Joseph M.
    Di Salvo, Paul A.
    Lewis, Jesse S.
    White, Michael D.
    Teton, Ben
    Beasley, James C.
    Schlichting, Peter E.
    Boughton, Raoul K.
    Wight, Bethany
    Newkirk, Eric S.
    Ivan, Jacob S.
    Odell, Eric A.
    Brook, Ryan K.
    Lukacs, Paul M.
    Moeller, Anna K.
    Mandeville, Elizabeth G.
    Clune, Jeff
    Miller, Ryan S.
    [J]. METHODS IN ECOLOGY AND EVOLUTION, 2019, 10 (04): : 585 - 590