When Can We Ignore Missing Data in Model Training?

被引：0

作者：

Zhen, Cheng ^{[1
]}

Chabada, Amandeep Singh ^{[1
]}

Termehchy, Arash ^{[1
]}

机构：

[1] Oregon State Univ, Corvallis, OR 97331 USA

来源：

PROCEEDINGS OF THE SEVENTH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM | 2023年

关键词：

data cleaning; machine learning; irrelevant and redundant data;

D O I：

10.1145/3595360.3595854

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Imputing missing data is typically expensive, and as a result, people seek to avoid it when possible. To address this issue, we introduce a method that determines when data cleaning is unnecessary for machine learning (ML). If a model can minimize the loss function regardless of the missing data's actual values, then data cleaning is not required. We offer efficient algorithms for checking this condition in multiple ML problems, and by analyzing the algorithms, we show that data cleaning is unnecessary when dealing with irrelevant and redundant data. Our preliminary experiments demonstrate that our algorithms can significantly reduce cleaning costs compared to a benchmark method, without incurring much computational overhead in many cases.

引用

页数：4

共 50 条

[21] Can We Collaborate? Mistakes Made When Group and Individual Therapists Ignore Multiple Realities
Marmarosh, Cheri L.
PSYCHOTHERAPY, 2016, 53 (03) : 320 - 324
[22] Modelling vagueness: what can we ignore?
Rosanna Keefe
Philosophical Studies, 2012, 161 : 453 - 470
[23] The elephant in the room we can't ignore
Macilwain, Colin
NATURE, 2016, 531 (7594) : 277 - 277
[24] The insect crisis we can’t ignore
Axel Hochkirch
Nature, 2016, 539 : 141 - 141
[25] 'A challenge we can no longer afford to ignore'
Peter Aldous
British Dental Journal, 2022, 232 (1) : 12 - 12
[26] The elephant in the room we can’t ignore
Colin Macilwain
Nature, 2016, 531 : 277 - 277
[27] Apolipoprotein B: can we continue to ignore?
Munigoti, Srinivasa P.
Rees, Alan
CURRENT OPINION IN LIPIDOLOGY, 2010, 21 (01) : 99 - 100
[28] We Can No Longer Ignore Our Oceans
Whitehouse, Sheldon
SEA TECHNOLOGY, 2019, 60 (01) : 11 - 12
[29] The aggregation of preferences: can we ignore the past?
Stéphane Zuber
Theory and Decision, 2011, 70 : 367 - 384
[30] When We Have Data We Can Count on, Everyone WINS
Tilson, Hugh
JOURNAL OF PUBLIC HEALTH MANAGEMENT AND PRACTICE, 2015, 21 : S173 - S174

← 1 2 3 4 5 →