Automatic Software Categorization Using Ensemble Methods and Bytecode Analysis

被引:3
|
作者
Catal, Cagatay [1 ]
Tugul, Serkan [1 ]
Akpinar, Basar [1 ]
机构
[1] Istanbul Kultur Univ, Dept Comp Engn, Atakoy Campus Bakirkoy, TR-34156 Istanbul, Turkey
关键词
Software categorization; machine learning; software repository; bytecode; TOPICS;
D O I
10.1142/S0218194017500425
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software repositories consist of thousands of applications and the manual categorization of these applications into domain categories is very expensive and time-consuming. In this study, we investigate the use of an ensemble of classifiers approach to solve the automatic software categorization problem when the source code is not available. Therefore, we used three data sets (package level/class level/method level) that belong to 745 closed-source Java applications from the Sharejar repository. We applied the Vote algorithm, AdaBoost, and Bagging ensemble methods and the base classifiers were Support Vector Machines, Naive Bayes, J48, IBk, and Random Forests. The best performance was achieved when the Vote algorithm was used. The base classifiers of the Vote algorithm were AdaBoost with J48, AdaBoost with Random Forest, and Random Forest algorithms. We showed that the Vote approach with method attributes provides the best performance for automatic software categorization; these results demonstrate that the proposed approach can effectively categorize applications into domain categories in the absence of source code.
引用
收藏
页码:1129 / 1144
页数:16
相关论文
共 50 条
  • [1] Automatic Categorization of Software Libraries Using Bytecode
    Escobar-Avila, Javier
    2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2, 2015, : 784 - 786
  • [2] Unsupervised Software Categorization using Bytecode
    Escobar-Avila, Javier
    Linares-Vasquez, Mario
    Haiduc, Sonia
    2015 IEEE 23RD INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION ICPC 2015, 2015, : 229 - 239
  • [3] Using Latent Dirichlet Allocation for Automatic Categorization of Software
    Tian, Kai
    Revelle, Meghan
    Poshyvanyk, Denys
    2009 6TH IEEE INTERNATIONAL WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES, 2009, : 163 - 166
  • [4] Automatic Categorization of Software Modules
    Sandhu, Parvinder Singh
    Bala, Madhu
    Singh, Hardeep
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (08): : 114 - 119
  • [5] A Study of Applying Unsupervised Learning Methods for Document Clustering and Automatic Categorization of Software
    Chen, Kai-Wen
    Huang, Chin-Yu
    2021 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEE IEEM21), 2021, : 1626 - 1630
  • [6] Automatic Accent Classification Using Ensemble Methods
    Bi, Fukun
    Yang, Jian
    Xu, Dan
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 755 - 758
  • [7] Improving Impact and Dependency Analysis through Software Categorization Methods
    Tanjong, Egbeyong
    Carver, Doris
    2021 9TH INTERNATIONAL CONFERENCE IN SOFTWARE ENGINEERING RESEARCH AND INNOVATION (CONISOFT 2021), 2021, : 142 - 151
  • [8] Automatic categorization algorithm for evolvable software archive
    Kawaguchi, S
    Garg, PK
    Matsushita, M
    Inoue, K
    SIXTH INTERNATIONAL WORKSHOP ON PRINCIPLES OF SOFTWARE EVOLUTION, PROCEEDINGS, 2003, : 195 - 200
  • [9] A comparison of several ensemble methods for text categorization
    Dong, YS
    Han, KS
    2004 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING, PROCEEDINGS, 2004, : 419 - 422
  • [10] Automatic Semantic Categorization of News Headlines using Ensemble Machine Learning: A Comparative Study
    Bogery, Raghad
    Al Babtain, Nora
    Aslam, Nida
    Alkabour, Nada
    Al Hashim, Yara
    Khan, Irfan Ullah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (11) : 689 - 696