A novel under sampling strategy for efficient software defect analysis of skewed distributed data

被引:0
|
作者
K. Nitalaksheswara Rao
Ch. Satyananda Reddy
机构
[1] Andhra University,Department of Computer Science and Systems Engineering
来源
Evolving Systems | 2020年 / 11卷
关键词
Software defects analysis; Classification; Decision tree; Class imbalance learning; Under sampling;
D O I
暂无
中图分类号
学科分类号
摘要
The software quality development process is a continuous process which starts by identifying a reliable fault detection technique. The implementation of the effective fault detection technique depends on the properties of the dataset in terms of domain information, characteristics of input data, complexity, etc. The early detection of defective modules provide more time for the developers to allocate resources effectively to deliver the quality software in time. The class imbalance nature of the software defect datasets indicates that the existing techniques are unsuccessful for identifying all the defective modules. Misclassification of the defective modules in the software engineering datasets invites unexpected loses to the software developers. To classify the class imbalance software datasets in an efficient way, we have proposed a novel approach called as under sampling strategy. This proposed approach uses under sampling strategy to reduce the less prominent instances from majority subset. The experimental results confirm that the proposed approach can deliver more accuracy in predicting the modules which are error prone with less and simple rules.
引用
收藏
页码:119 / 131
页数:12
相关论文
共 50 条
  • [41] RELIABILITY-ANALYSIS OF LARGE SOFTWARE SYSTEMS - DEFECT DATA MODELING
    LEVENDEL, Y
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1990, 16 (02) : 141 - 152
  • [42] Novel coordinated secondary voltage control strategy for efficient utilisation of distributed generations
    Alobeidli, Khaled
    El Moursi, Mohamed Shawky
    [J]. IET RENEWABLE POWER GENERATION, 2014, 8 (05) : 569 - 579
  • [43] A Workload Assignment Strategy for Efficient ROLAP Data Cube Computation in Distributed Systems
    Suh, Ilhyun
    Chung, Yon Dohn
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2016, 12 (03) : 51 - 71
  • [44] CDFRS: A scalable sampling approach for efficient big data analysis
    Cai, Yongda
    Wu, Dingming
    Sun, Xudong
    Wu, Siyue
    Xu, Jingsheng
    Huang, Joshua Zhexue
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
  • [45] Data sampling approach using heuristic Learning Vector Quantization (LVQ) classifier for software defect prediction
    Amanullah, M.
    Ramya, S. Thanga
    Sudha, M.
    Pushparathi, V. P. Gladis
    Haldorai, Anandakumar
    Pant, Bhaskar
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (03) : 3867 - 3876
  • [46] DISTRIBUTED DATA-ANALYSIS IN COLLABORATIVE STUDIES - THE CARDIA STRATEGY
    PERKINS, L
    WAGENKNECHT, L
    CUTTER, G
    BIRCH, R
    BLANTON, M
    DYER, A
    [J]. CONTROLLED CLINICAL TRIALS, 1987, 8 (03): : 281 - 282
  • [47] Distributed Storage Strategy and Visual Analysis for Economic Big Data
    Chang, Xiangli
    Cui, Hailang
    [J]. JOURNAL OF MATHEMATICS, 2021, 2021
  • [48] SSFile: A novel column-store for efficient data analysis in Hadoop-based distributed systems
    Son, Jihoon
    Ryu, Hyoseok
    Yi, Sungmin
    Chung, Yon Dohn
    [J]. INFORMATION SCIENCES, 2015, 316 : 68 - 86
  • [49] Efficient Publication of Distributed and Overlapping Graph Data Under Differential Privacy
    Xu Zheng
    Lizong Zhang
    Kaiyang Li
    Xi Zeng
    [J]. Tsinghua Science and Technology, 2022, 27 (02) : 235 - 243
  • [50] Efficient Publication of Distributed and Overlapping Graph Data Under Differential Privacy
    Zheng, Xu
    Zhang, Lizong
    Li, Kaiyang
    Zeng, Xi
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (02) : 235 - 243