Automatic Software Categorization Using Ensemble Methods and Bytecode Analysis

被引:3
|
作者
Catal, Cagatay [1 ]
Tugul, Serkan [1 ]
Akpinar, Basar [1 ]
机构
[1] Istanbul Kultur Univ, Dept Comp Engn, Atakoy Campus Bakirkoy, TR-34156 Istanbul, Turkey
关键词
Software categorization; machine learning; software repository; bytecode; TOPICS;
D O I
10.1142/S0218194017500425
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software repositories consist of thousands of applications and the manual categorization of these applications into domain categories is very expensive and time-consuming. In this study, we investigate the use of an ensemble of classifiers approach to solve the automatic software categorization problem when the source code is not available. Therefore, we used three data sets (package level/class level/method level) that belong to 745 closed-source Java applications from the Sharejar repository. We applied the Vote algorithm, AdaBoost, and Bagging ensemble methods and the base classifiers were Support Vector Machines, Naive Bayes, J48, IBk, and Random Forests. The best performance was achieved when the Vote algorithm was used. The base classifiers of the Vote algorithm were AdaBoost with J48, AdaBoost with Random Forest, and Random Forest algorithms. We showed that the Vote approach with method attributes provides the best performance for automatic software categorization; these results demonstrate that the proposed approach can effectively categorize applications into domain categories in the absence of source code.
引用
收藏
页码:1129 / 1144
页数:16
相关论文
共 50 条
  • [21] Automatic Text Categorization using NTC
    Jo, Taeho
    NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 26 - 31
  • [22] Categorization and Comparison of Accessibility Testing Methods for Software Development
    Bai, Aleksander
    Fuglerud, Kristin
    Skjerve, Rannveig A.
    Halbac, Till
    TRANSFORMING OUR WORLD THROUGH DESIGN, DIVERSITY AND EDUCATION, 2018, 256 : 821 - 831
  • [23] Text categorization methods for automatic estimation of verbal intelligence
    Fernandez-Martinez, Fernando
    Zablotskaya, Kseniya
    Minker, Wolfgang
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) : 9807 - 9820
  • [24] Based on Multi-Features and Clustering Ensemble Method for Automatic Malware Categorization
    Zhang, Yunan
    Rong, Chenghao
    Huang, Qingjia
    Wu, Yang
    Yang, Zeming
    Jiang, Jianguo
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 73 - 82
  • [25] Automatic sleep scoring using statistical features in the EMD domain and ensemble methods
    Hassan, Ahnaf Rashik
    Bhuiyan, Mohammed Imamul Hassan
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2016, 36 (01) : 248 - 255
  • [26] LACTA: An Enhanced Automatic Software Categorization on the Native Code of Android Applications
    Yang, Cheng-Zen
    Tu, Ming-Hsuan
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 769 - 773
  • [27] Automatic Sign Categorization using Visual Data
    Hruz, Marek
    ASSETS 11: PROCEEDINGS OF THE 13TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2011, : 229 - 230
  • [28] Using kNN model for automatic text categorization
    Guo, GD
    Wang, H
    Bell, D
    Bi, YX
    Greer, K
    SOFT COMPUTING, 2006, 10 (05) : 423 - 430
  • [29] Using Testing Trace for Automatic User Categorization
    Li, J. Jenny
    Weiss, David M.
    2009 ICSE WORKSHOP ON AUTOMATION OF SOFTWARE TEST, 2009, : 144 - 148
  • [30] Using kNN model for automatic text categorization
    Gongde Guo
    Hui Wang
    David Bell
    Yaxin Bi
    Kieran Greer
    Soft Computing, 2006, 10 : 423 - 430