Automatic Software Categorization Using Ensemble Methods and Bytecode Analysis

被引:3
|
作者
Catal, Cagatay [1 ]
Tugul, Serkan [1 ]
Akpinar, Basar [1 ]
机构
[1] Istanbul Kultur Univ, Dept Comp Engn, Atakoy Campus Bakirkoy, TR-34156 Istanbul, Turkey
关键词
Software categorization; machine learning; software repository; bytecode; TOPICS;
D O I
10.1142/S0218194017500425
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software repositories consist of thousands of applications and the manual categorization of these applications into domain categories is very expensive and time-consuming. In this study, we investigate the use of an ensemble of classifiers approach to solve the automatic software categorization problem when the source code is not available. Therefore, we used three data sets (package level/class level/method level) that belong to 745 closed-source Java applications from the Sharejar repository. We applied the Vote algorithm, AdaBoost, and Bagging ensemble methods and the base classifiers were Support Vector Machines, Naive Bayes, J48, IBk, and Random Forests. The best performance was achieved when the Vote algorithm was used. The base classifiers of the Vote algorithm were AdaBoost with J48, AdaBoost with Random Forest, and Random Forest algorithms. We showed that the Vote approach with method attributes provides the best performance for automatic software categorization; these results demonstrate that the proposed approach can effectively categorize applications into domain categories in the absence of source code.
引用
收藏
页码:1129 / 1144
页数:16
相关论文
共 50 条
  • [31] Automatic Assamese Text Categorization Using WordNet
    Sarmah, Jumi
    Barman, Anup Kumar
    Sarma, Shikhar Kr.
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 85 - 89
  • [32] Automatic text categorization using neural networks
    Ruiz, ME
    Srinivasan, P
    ADVANCES IN CLASSIFICATION RESEARCH, VOL 8, 1998, : 59 - 72
  • [33] Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
    Lan, Man
    Tan, Chew Lim
    Su, Jian
    Lu, Yue
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (04) : 721 - 735
  • [34] Brain activity across the development of automatic categorization: A comparison of categorization tasks using multi-voxel pattern analysis
    Soto, Fabian A.
    Waldschmidt, Jennifer G.
    Helie, Sebastien
    Ashby, F. Gregory
    NEUROIMAGE, 2013, 71 : 284 - 297
  • [35] Keyword Categorization using Statistical Methods
    Krasnanska, Dominika
    Komara, Silvia
    Vojtkova, Maria
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2021, 10 (03): : 1377 - 1384
  • [36] A hybrid approach to software fault prediction using genetic programming and ensemble learning methods
    Satya Prakash Sahu
    B. Ramachandra Reddy
    Dev Mukherjee
    D. M. Shyamla
    Bhim Singh Verma
    International Journal of System Assurance Engineering and Management, 2022, 13 : 1746 - 1760
  • [37] A hybrid approach to software fault prediction using genetic programming and ensemble learning methods
    Sahu, Satya Prakash
    Reddy, B. Ramachandra
    Mukherjee, Dev
    Shyamla, D. M.
    Verma, Bhim Singh
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (04) : 1746 - 1760
  • [38] Empirical Analysis of Data Sampling-Based Ensemble Methods in Software Defect Prediction
    Balogun, Abdullateef O.
    Odejide, Babajide J.
    Bajeh, Amos O.
    Alanamu, Zubair O.
    Usman-Hamza, Fatima E.
    Adeleke, Hammid O.
    Mabayoje, Modinat A.
    Yusuff, Shakirat R.
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2022 WORKSHOPS, PART V, 2022, 13381 : 363 - 379
  • [39] Automatic mapping of configuration options in software using static analysis
    Wang, Junyong
    Baker, Thar
    Zhou, Yingnan
    Awad, Ali Ismail
    Wang, Bin
    Zhu, Yongsheng
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (10) : 10044 - 10055
  • [40] Another Perspective on Ensemble Methods for Automatic Keyword Extraction
    Lucci, Stephen
    Cox, James L.
    Pay, Tayfun
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5424 - 5426