Categorical missing data imputation for software cost estimation by multinomial logistic regression

被引：44

作者：

Sentas, P ^{[1
]}

Angelis, L ^{[1
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece

来源：

JOURNAL OF SYSTEMS AND SOFTWARE | 2006年 / 79卷 / 03期

关键词：

software effort prediction; cost estimation; missing data; imputation; multinomial logistic regression;

D O I：

10.1016/j.jss.2005.02.026

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

A common problem in software cost estimation is the manipulation of incomplete or missing data in databases used for the development of prediction models. In such cases, the most popular and simple method of handling missing data is to ignore either the projects or the attributes with missing observations. This technique causes the loss of valuable information and therefore may lead to inaccurate cost estimation models. On the other hand, there are various imputation methods used to estimate the missing values in a data set. These methods are applied mainly on numerical data and produce continuous estimates. However, it is well known that the majority of the cost data sets contain software projects with mostly categorical attributes with many missing values. It is therefore reasonable to use some estimating method producing categorical rather than continuous values. The purpose of this paper is to investigate the possibility of using such a method for estimating categorical missing values in software cost databases. Specifically, the method known as multinomial logistic regression (MLR) is suggested for imputation and is applied on projects of the ISBSG multi-organizational software database. Comparisons of NILR with other techniques for handling missing data, such as listwise deletion (LD), mean imputation (MI), expectation maximization (EM) and regression imputation (RI) under different patterns and percentages of missing data, show the high efficiency of the proposed method. (C) 2005 Elsevier Inc. All rights reserved.

引用

页码：404 / 414

页数：11

共 50 条

[1] Multinomial logistic regression with missing outcome data: An application to cancer subtypes
Wang, Ching-Yun
Hsu, Li
[J]. STATISTICS IN MEDICINE, 2020, 39 (24) : 3299 - 3312
[2] Statistical micro matching using a multinomial logistic regression model for categorical data
Kim, Kangmin
Park, Mingue
[J]. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2019, 26 (05) : 507 - 517
[3] The effect of high prevalence of missing data on estimation of the coefficients of a logistic regression model when using multiple imputation
Peter C. Austin
Stef van Buuren
[J]. BMC Medical Research Methodology, 22
[4] The effect of high prevalence of missing data on estimation of the coefficients of a logistic regression model when using multiple imputation
Austin, Peter C.
van Buuren, Stef
[J]. BMC MEDICAL RESEARCH METHODOLOGY, 2022, 22 (01)
[5] Imputation of missing dependent variable in binary logistic regression
Thammachoto, Tidarat
Samart, Klairung
[J]. MAEJO INTERNATIONAL JOURNAL OF SCIENCE AND TECHNOLOGY, 2024, 18 (01) : 61 - 74
[6] Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods
Lee, Shen-Ming
Le, Truong-Nhat
Tran, Phuoc-Loc
Li, Chin-Shang
[J]. COMPUTATIONAL STATISTICS, 2023, 38 (02) : 899 - 934
[7] Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods
Shen-Ming Lee
Truong-Nhat Le
Phuoc-Loc Tran
Chin-Shang Li
[J]. Computational Statistics, 2023, 38 : 899 - 934
[8] Simulation of multinomial probit probabilities and imputation of missing data
Lavy, V
Palumbo, M
Stern, S
[J]. ADVANCES IN ECONOMETRICS, VOL 13 1998, 1998, 13 : 145 - 179
[9] Bias correction in logistic regression with missing categorical covariates
Das, Ujjwal
Maiti, Tapabrata
Pradhan, Vivek
[J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (09) : 2478 - 2485
[10] MISSING DATA, IMPUTATION AND REGRESSION TREES
Loh, Wei-Yin
Zhang, Qiong
Zhang, Wenwen
Zhou, Peigen
[J]. STATISTICA SINICA, 2020, 30 (04) : 1697 - 1722

← 1 2 3 4 5 →