When Can We Ignore Missing Data in Model Training?

被引:0
|
作者
Zhen, Cheng [1 ]
Chabada, Amandeep Singh [1 ]
Termehchy, Arash [1 ]
机构
[1] Oregon State Univ, Corvallis, OR 97331 USA
关键词
data cleaning; machine learning; irrelevant and redundant data;
D O I
10.1145/3595360.3595854
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imputing missing data is typically expensive, and as a result, people seek to avoid it when possible. To address this issue, we introduce a method that determines when data cleaning is unnecessary for machine learning (ML). If a model can minimize the loss function regardless of the missing data's actual values, then data cleaning is not required. We offer efficient algorithms for checking this condition in multiple ML problems, and by analyzing the algorithms, we show that data cleaning is unnecessary when dealing with irrelevant and redundant data. Our preliminary experiments demonstrate that our algorithms can significantly reduce cleaning costs compared to a benchmark method, without incurring much computational overhead in many cases.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] The experimental autoimmune encephalomyelitis model in Lewis rats: can we ignore the concomitant arthritis?
    Godessart, Nuria
    [J]. MULTIPLE SCLEROSIS, 2008, 14 : S81 - S81
  • [32] Thrombosis of Bioprosthetic Valves Can We Afford to Ignore It?
    Stewart, William J.
    [J]. JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2015, 66 (21) : 2295 - 2297
  • [33] 'A challenge we can no longer afford to ignore' COMMENT
    Aldous, Peter
    [J]. BRITISH DENTAL JOURNAL, 2022, 232 (01) : 12 - 12
  • [34] ESTIMATION FOR THE MULTIPLE FACTOR MODEL WHEN DATA ARE MISSING
    FINKBEINER, C
    [J]. PSYCHOMETRIKA, 1979, 44 (04) : 409 - 420
  • [35] FORECAST ERROR WITH ARIMA MODEL WHEN THERE ARE MISSING DATA
    ROY, R
    VODAI, T
    [J]. INFOR, 1979, 17 (03) : 287 - 295
  • [36] ORGAN RETRIEVAL - CAN WE IGNORE THE DARK SIDE
    YOUNGNER, SJ
    [J]. TRANSPLANTATION PROCEEDINGS, 1990, 22 (03) : 1014 - 1015
  • [37] Why We Can No Longer Ignore Consecutive Disasters
    de Ruiter, Marleen C.
    Couasnon, Anais
    van den Homberg, Marc J. C.
    Daniell, James E.
    Gill, Joel C.
    Ward, Philip J.
    [J]. EARTHS FUTURE, 2020, 8 (03)
  • [38] Are we educating when we should be training, or are we training when we should be educating?
    Kissinger, PT
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1997, 214 : 152 - ANYL
  • [39] Can we ignore changing P wave polarity?
    Chhabra, Swati
    Singhal, Suresh Kumar
    [J]. KOREAN JOURNAL OF ANESTHESIOLOGY, 2016, 69 (02) : 200 - 200
  • [40] Pulmonary hypertension and breathlessness: is it a combination we can ignore?
    Strange, G.
    Williams, T.
    Kermeen, F.
    Whyte, K.
    Keogh, A.
    [J]. INTERNAL MEDICINE JOURNAL, 2014, 44 (02) : 114 - 123