Grey Relational Analysis based k Nearest Neighbor Missing Data Imputation for Software Quality Datasets

被引:9
|
作者
Huang, Jianglin [1 ]
Sun, Hongyi [1 ]
机构
[1] City Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
关键词
kNN; imputation; empirical software engineering estimation; missing data; COST ESTIMATION; DATA SETS; SELECTION; INFORMATION;
D O I
10.1109/QRS.2016.20
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software quality estimation is important yet difficult in software engineering studies. Historical quality datasets are used to build classification models for estimating fault proneness. However, the missing values in the datasets severely affect the estimation ability and therefore, cause inconclusive decision-making. Among the single imputation approaches, k nearest neighbor (kNN) imputation is popular in empirical studies due to the relatively high accuracy. However, researchers are still calling for the optimal parameter setting of kNN imputation. In this study, a novel grey relational analysis based incomplete-instance kNN imputation is built for software quality data. An evaluation is conducted on four quality datasets with different simulated missingness scenarios to analyze the performance of the proposed imputation. The empirical results show that the proposed approach is superior to traditional kNN imputation and mean imputation in most cases. Moreover, the classification accuracy can be maintained or even improved by using this approach in classification tasks.
引用
收藏
页码:86 / 91
页数:6
相关论文
共 50 条
  • [1] Missing data imputation by K nearest neighbours based on grey relational structure and mutual information
    Ruilin Pan
    Tingsheng Yang
    Jianhua Cao
    Ke Lu
    Zhanchao Zhang
    [J]. Applied Intelligence, 2015, 43 : 614 - 632
  • [2] Missing data imputation by K nearest neighbours based on grey relational structure and mutual information
    Pan, Ruilin
    Yang, Tingsheng
    Cao, Jianhua
    Lu, Ke
    Zhang, Zhanchao
    [J]. APPLIED INTELLIGENCE, 2015, 43 (03) : 614 - 632
  • [3] Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study
    Huang, Jianglin
    Keung, Jacky Wai
    Sarro, Federica
    Li, Yan-Fu
    Yu, Y. T.
    Chan, W. K.
    Sun, Hongyi
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 132 : 226 - 252
  • [4] K-Nearest Neighbor (K-NN) based Missing Data Imputation
    Murti, Della Murbarani Prawidya
    Wibawa, Aji Prasetya
    Akbar, Muhammad Iqbal
    Ianto, Utomo Puj
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON SCIENCE ININFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0 - TOWARDS INNOVATION IN CYBER PHYSICAL SYSTEM, 2019, : 83 - 88
  • [5] Differentially Private k-Nearest Neighbor Missing Data Imputation
    Clifton, Chris
    Hanson, Eric J.
    Merrill, Keith
    Merrill, Shawn
    [J]. ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2022, 25 (03)
  • [6] Improved methods for the imputation of missing data by nearest neighbor methods
    Tutz, Gerhard
    Ramzan, Shahla
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 90 : 84 - 99
  • [7] Missing Data Imputation using Evolutionary k- Nearest Neighbor Algorithm for Gene Expression Data
    de Silva, Hiroshi
    Perera, A. Shehan
    [J]. 2016 SIXTEENTH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER) - 2016, 2016, : 141 - 146
  • [8] Missing value imputation method based on density clustering and grey relational analysis
    Peng, Li
    Ting-Ting, Zhang
    Tian-Ge, Liang
    Kai-Hui, Zhang
    [J]. International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (11): : 133 - 142
  • [9] Imputation of missing values in well log data using k-nearest neighbor collaborative filtering
    Kim, Min Jun
    Cho, Yongchae
    [J]. COMPUTERS & GEOSCIENCES, 2024, 193
  • [10] An Empirical Study of Dynamic Incomplete-case Nearest Neighbor Imputation in Software Quality Data
    Huang, Jianglin
    Sun, Hongyi
    Li, Yan-Fu
    Xie, Min
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE SECURITY AND RELIABILITY (QRS 2015), 2015, : 37 - 42