Missing Data Imputation Based on Low-Rank Recovery and Semi-Supervised Regression for Software Effort Estimation

被引:18
|
作者
Jing, Xiao-Yuan [1 ,2 ]
Qi, Fumin [1 ]
Wu, Fei [1 ,2 ]
Xu, Baowen [1 ,3 ]
机构
[1] Wuhan Univ, State Key Lab Software Engn, Sch Comp, Wuhan, Hubei, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Automat, Nanjing, Jiangsu, Peoples R China
[3] Nanjing Univ, Dept Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
关键词
Software effort estimation; Missing data problem; Drive factor missing case; Effort label missing case; Low-rank recovery and semi-supervised regression imputation (LRSRI); COST ESTIMATION; DATA SETS; MODELS;
D O I
10.1145/2884781.2884827
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semi-supervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well.
引用
收藏
页码:607 / 618
页数:12
相关论文
共 50 条
  • [1] Imputation and low-rank estimation with Missing Not At Random data
    Aude Sportisse
    Claire Boyer
    Julie Josse
    [J]. Statistics and Computing, 2020, 30 : 1629 - 1643
  • [2] Imputation and low-rank estimation with Missing Not At Random data
    Sportisse, Aude
    Boyer, Claire
    Josse, Julie
    [J]. STATISTICS AND COMPUTING, 2020, 30 (06) : 1629 - 1643
  • [3] Low-rank representation for semi-supervised software defect prediction
    Zhang, Zhi-Wu
    Jing, Xiao-Yuan
    Wu, Fei
    [J]. IET SOFTWARE, 2018, 12 (06) : 527 - 535
  • [4] Unsupervised, Supervised and Semi-supervised Dimensionality Reduction by Low-Rank Regression Analysis
    TANG Kewei
    ZHANG Jun
    ZHANG Changsheng
    WANG Lijun
    ZHAI Yun
    JIANG Wei
    [J]. Chinese Journal of Electronics, 2021, 30 (04) : 603 - 610
  • [5] Unsupervised, Supervised and Semi-supervised Dimensionality Reduction by Low-Rank Regression Analysis
    TANG, Kewei
    ZHANG, Jun
    ZHANG, Changsheng
    WANG, Lijun
    Zhai, Yun
    JIANG, Wei
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2021, 30 (04) : 603 - 610
  • [6] Semi-supervised Imputation for Microarray Missing Value Estimation
    Li, Hui-Hui
    Shao, Feng-Feng
    Li, Guo-Zheng
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [7] Iterative Robust Semi-Supervised Missing Data Imputation
    Fazakis, Nikos
    Kostopoulos, Georgios
    Kotsiantis, Sotiris
    Mporas, Iosif
    [J]. IEEE ACCESS, 2020, 8 : 90555 - 90569
  • [8] Missing Value Imputation Using a Semi-supervised Rank Aggregation Approach
    Matsubara, Edson T.
    Prati, Ronaldo C.
    Batista, Gustavo E. A. P. A.
    Monard, Maria C.
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2008, PROCEEDINGS, 2008, 5249 : 217 - 226
  • [9] Sparse semi-supervised learning on low-rank kernel
    Zhang, Kai
    Wang, Qiaojun
    Lan, Liang
    Sun, Yu
    Marsic, Ivan
    [J]. NEUROCOMPUTING, 2014, 129 : 265 - 272
  • [10] Semi-supervised low-rank representation for image classification
    Chenxue Yang
    Mao Ye
    Song Tang
    Tao Xiang
    Zijian Liu
    [J]. Signal, Image and Video Processing, 2017, 11 : 73 - 80