An empirical study for software change prediction using imbalanced data

被引:48
|
作者
Malhotra, Ruchika [1 ]
Khanna, Megha [2 ]
机构
[1] Delhi Technol Univ, Dept Software Engn, Delhi, India
[2] Delhi Technol Univ, Delhi, India
关键词
Change proneness; Data sampling; Empirical validation; Imbalanced learning; MetaCost learners; Object-oriented metrics; STATIC CODE ATTRIBUTES; CHANGE-PRONE CLASSES; FAULT-PRONENESS; METRICS; CLASSIFICATION; FRAMEWORK; QUALITY; SUITE;
D O I
10.1007/s10664-016-9488-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software change prediction is crucial in order to efficiently plan resource allocation during testing and maintenance phases of a software. Moreover, correct identification of change-prone classes in the early phases of software development life cycle helps in developing cost-effective, good quality and maintainable software. An effective software change prediction model should equally recognize change-prone and not change-prone classes with high accuracy. However, this is not the case as software practitioners often have to deal with imbalanced data sets where instances of one type of class is much higher than the other type. In such a scenario, the minority classes are not predicted with much accuracy leading to strategic losses. This study evaluates a number of techniques for handling imbalanced data sets using various data sampling methods and MetaCost learners on six open-source data sets. The results of the study advocate the use of resample with replacement sampling method for effective imbalanced learning.
引用
收藏
页码:2806 / 2851
页数:46
相关论文
共 50 条
  • [1] An empirical study for software change prediction using imbalanced data
    Ruchika Malhotra
    Megha Khanna
    [J]. Empirical Software Engineering, 2017, 22 : 2806 - 2851
  • [2] An empirical study on predictability of software maintainability using imbalanced data
    Malhotra, Ruchika
    Lata, Kusum
    [J]. SOFTWARE QUALITY JOURNAL, 2020, 28 (04) : 1581 - 1614
  • [3] An empirical study on predictability of software maintainability using imbalanced data
    Ruchika Malhotra
    Kusum Lata
    [J]. Software Quality Journal, 2020, 28 : 1581 - 1614
  • [4] An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data
    Malhotra, Ruchika
    Kamal, Shine
    [J]. NEUROCOMPUTING, 2019, 343 : 120 - 140
  • [5] Using Hybridized techniques for Prediction of Software Maintainability using Imbalanced data
    Malhotra, Ruchika
    Lata, Kusum
    [J]. PROCEEDINGS OF THE CONFLUENCE 2020: 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING, 2020, : 787 - 792
  • [6] An Empirical Study on the Stability of Feature Selection for Imbalanced Software Engineering Data
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 317 - 323
  • [7] Analysis of the Performance of Learners for Change Prediction Using Imbalanced Data
    Bansal, Ankita
    Modi, Kanika
    Jain, Roopal
    [J]. APPLICATIONS OF ARTIFICIAL INTELLIGENCE TECHNIQUES IN ENGINEERING, SIGMA 2018, VOL 1, 2019, 698 : 345 - 359
  • [8] Handling Imbalanced Data using Ensemble Learning in Software Defect Prediction
    Malhotra, Ruchika
    Jain, Juhi
    [J]. PROCEEDINGS OF THE CONFLUENCE 2020: 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING, 2020, : 300 - 304
  • [9] An empirical study of the classification performance of learners on imbalanced and noisy software quality data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Folleco, Andres
    [J]. IRI 2007: PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2007, : 651 - +
  • [10] Software Defect Prediction using Propositionalization based Data Preprocessing: An Empirical Study
    Pak, CholMyong
    Wang, Tian Tian
    Su, Xiao Hong
    [J]. 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND BUSINESS ANALYTICS (ICDSBA 2018), 2018, : 71 - 77