On using machine learning to automatically classify software applications into domain categories

被引:0
|
作者
Mario Linares-Vásquez
Collin McMillan
Denys Poshyvanyk
Mark Grechanik
机构
[1] The College of William and Mary,
[2] Universitry of Notre Dame,undefined
[3] University of Illinois at Chicago,undefined
来源
关键词
Closed-source; Open-source; Software categorization; Machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
Software repositories hold applications that are often categorized to improve the effectiveness of various maintenance tasks. Properly categorized applications allow stakeholders to identify requirements related to their applications and predict maintenance problems in software projects. Manual categorization is expensive, tedious, and laborious – this is why automatic categorization approaches are gaining widespread importance. Unfortunately, for different legal and organizational reasons, the applications’ source code is often not available, thus making it difficult to automatically categorize these applications. In this paper, we propose a novel approach in which we use Application Programming Interface (API) calls from third-party libraries for automatic categorization of software applications that use these API calls. Our approach is general since it enables different categorization algorithms to be applied to repositories that contain both source code and bytecode of applications, since API calls can be extracted from both the source code and byte-code. We compare our approach to a state-of-the-art approach that uses machine learning algorithms for software categorization, and conduct experiments on two large Java repositories: an open-source repository containing 3,286 projects and a closed-source repository with 745 applications, where the source code was not available. Our contribution is twofold: we propose a new approach that makes it possible to categorize software projects without any source code using a small number of API calls as attributes, and furthermore we carried out a comprehensive empirical evaluation of automatic categorization approaches.
引用
收藏
页码:582 / 618
页数:36
相关论文
共 50 条
  • [1] On using machine learning to automatically classify software applications into domain categories
    Linares-Vasquez, Mario
    McMillan, Collin
    Poshyvanyk, Denys
    Grechanik, Mark
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2014, 19 (03) : 582 - 618
  • [2] A tree-based machine learning methodology to automatically classify software vulnerabilities
    Aivatoglou, Georgios
    Anastasiadis, Mike
    Spanos, Georgios
    Voulgaridis, Antonis
    Votis, Konstantinos
    Tzovaras, Dimitrios
    [J]. PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (IEEE CSR), 2021, : 312 - 317
  • [3] Comparison of Machine Learning Methods to Automatically Classify Keratoconus
    Hidalgo, Irene Ruiz
    Rodriguez Perez, Pablo
    Rozema, Jos J.
    Tassignon, Marie-Jose B. R.
    [J]. INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2014, 55 (13)
  • [4] A Machine Learning Pipeline to Automatically Identify and Classify Roadway Surface Disruptions
    Aragon, M. Ezra
    Ricardo Carlos, M.
    Gonzalez Gurrola, Luis C.
    Jair Escalante, Hugo
    [J]. ENC'16: PROCEEDINGS OF THE SIXTEENTH MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE, 2015,
  • [5] Automatically Classify Chinese Judgment Documents Utilizing Machine Learning Algorithms
    Lei, Miaomiao
    Ge, Jidong
    Li, Zhongjin
    Li, Chuanyi
    Zhou, Yemao
    Zhou, Xiaoyu
    Luo, Bin
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), 2017, 10179 : 3 - 17
  • [6] Machine learning application to automatically classify heavy minerals in river sand by using SEM/EDS data
    Hao, Huizhen
    Guo, Ronghua
    Gu, Qing
    Hu, Xiumian
    [J]. MINERALS ENGINEERING, 2019, 143
  • [7] Performance analysis of Machine Learning Algorithms to classify Software Requirements
    Idate, Sonali
    Rao, T. Srinivasa
    Gayakwad, Milind
    Paygude, Priyanka
    Chavan, Prashant
    Pawar, Rajendra
    Kadam, Kalyani
    [J]. JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (02) : 1588 - 1599
  • [8] Automatically detect and classify asphalt pavement raveling severity using 3D technology and machine learning
    Yi-Chang (James) Tsai
    Yipu Zhao
    Bruno Pop-Stefanov
    Anirban Chatterjee
    [J]. International Journal of Pavement Research and Technology, 2021, 14 : 487 - 495
  • [9] Automatically detect and classify asphalt pavement raveling severity using 3D technology and machine learning
    Tsai, Yi-Chang
    Zhao, Yipu
    Pop-Stefanov, Bruno
    Chatterjee, Anirban
    [J]. INTERNATIONAL JOURNAL OF PAVEMENT RESEARCH AND TECHNOLOGY, 2021, 14 (04) : 487 - 495
  • [10] Using Machine Learning to Classify Test Outcomes
    Roper, Marc
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST), 2019, : 99 - 100