Automatic Software Categorization Using Ensemble Methods and Bytecode Analysis

被引:3
|
作者
Catal, Cagatay [1 ]
Tugul, Serkan [1 ]
Akpinar, Basar [1 ]
机构
[1] Istanbul Kultur Univ, Dept Comp Engn, Atakoy Campus Bakirkoy, TR-34156 Istanbul, Turkey
关键词
Software categorization; machine learning; software repository; bytecode; TOPICS;
D O I
10.1142/S0218194017500425
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software repositories consist of thousands of applications and the manual categorization of these applications into domain categories is very expensive and time-consuming. In this study, we investigate the use of an ensemble of classifiers approach to solve the automatic software categorization problem when the source code is not available. Therefore, we used three data sets (package level/class level/method level) that belong to 745 closed-source Java applications from the Sharejar repository. We applied the Vote algorithm, AdaBoost, and Bagging ensemble methods and the base classifiers were Support Vector Machines, Naive Bayes, J48, IBk, and Random Forests. The best performance was achieved when the Vote algorithm was used. The base classifiers of the Vote algorithm were AdaBoost with J48, AdaBoost with Random Forest, and Random Forest algorithms. We showed that the Vote approach with method attributes provides the best performance for automatic software categorization; these results demonstrate that the proposed approach can effectively categorize applications into domain categories in the absence of source code.
引用
收藏
页码:1129 / 1144
页数:16
相关论文
共 50 条
  • [41] Ensemble imputation methods for missing software engineering data
    Twala, B
    Cartwright, M
    2005 11TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS (METRICS), 2005, : 268 - 277
  • [42] Ensemble classifiers using different feature sets for webpage categorization
    Soonthornphisaj, Nuanwan
    Kijsirikul, Boonserm
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2007, 14 : 174 - 181
  • [43] Arabic Sentiment Analysis Using Deep Learning and Ensemble Methods
    Alharbi, Amal
    Kalkatawi, Manal
    Taileb, Mounira
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (09) : 8913 - 8923
  • [44] Arabic Sentiment Analysis Using Deep Learning and Ensemble Methods
    Amal Alharbi
    Manal Kalkatawi
    Mounira Taileb
    Arabian Journal for Science and Engineering, 2021, 46 : 8913 - 8923
  • [45] Automatic Arabic Text Categorization using Bayesian Learning
    Kadhim, Mahmood H.
    Omar, Nazlia
    2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 415 - 419
  • [46] Improving the Performance of Text Categorization using Automatic Summarization
    Jiang Xiao-Yu
    Fan Xiao-Zhong
    Wang Zhi-Fei
    Jia Ke-Liang
    2009 INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION, PROCEEDINGS, 2009, : 347 - +
  • [47] Automatic Categorization of Image Databases using Web Folksonomies
    Capasso, Pasquale
    Chianese, Angelo
    Moscato, Vincenzo
    Penta, Antonio
    Picariello, Antonio
    ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 685 - 690
  • [48] Automatic categorization of patent applications using classifier combinations
    Mathiassen, Henrik
    Ortiz-Arroyo, Daniel
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2006, PROCEEDINGS, 2006, 4224 : 1039 - 1047
  • [49] Automatic learning features using bootstrapping for text categorization
    Chen, WL
    Zhu, JB
    Wu, HL
    Yao, TS
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 571 - 579
  • [50] Automatic video genre categorization using hierarchical SVM
    Yuan, Xun
    Lai, Wei
    Mei, Tao
    Hua, Xian-Sheng
    Wu, Xiu-Qing
    Li, Shipeng
    2006 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP 2006, PROCEEDINGS, 2006, : 2905 - +