Addressing Bias from Non-Random Missing Attributes in Health Data

被引：0

作者：

Napoli, Nicholas J. ^{[1
]}

Kotoriy, Madeline E. ^{[2
]}

Barnhardt, William ^{[3
]}

Young, Jeffrey S. ^{[4
]}

Barnes, Laura E. ^{[1
]}

机构：

[1] Univ Virginia, Syst & Informat Engn, Charlottesville, VA 22904 USA

[2] Univ Virginia, Batten Sch Leadership & Publ Policy, Charlottesville, VA 22904 USA

[3] Univ Virginia Hlth Syst, Emergency Serv, Charlottesville, VA 22904 USA

[4] Univ Virginia, Dept Surg, Charlottesville, VA 22908 USA

来源：

2017 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL & HEALTH INFORMATICS (BHI) | 2017年

关键词：

PERFORMANCE; TRAUMA;

D O I：

暂无

中图分类号：

R-058 [];

学科分类号：

摘要：

This paper aims to improve health outcomes research and data management practices. Typically health care records are very large and cumbersome to manage, and the quality of the data is often overlooked because the volume is thought to be large enough to overcome issues arising from missing data. However, simply removing observations with missing data is problematic because the distribution of missing information is non-random, thus the sample used for analysis becomes biased. We propose a method for evaluating and addressing bias in the data cleaning process. Specifically, we identify where bias exists within data and address the bias using sub-sampling or discarding data. We present a case study analyzing data from a level 1 trauma center to establish how bias in health registries exists and how this bias can have downstream implications for evaluating hospital performance. Our method utilizes a two-tailed z-test to compare subgroups in the data set, which demonstrates how missing data in these subgroups can lead to bias. We demonstrate how to localize the bias in particular subgroups and provide corrective actions to handle the bias. We also exhibit how failure to account for bias can distort performance, illustrating the importance of the proposed method.

引用

页码：265 / 268

页数：4

共 50 条

[21] Non-random reflections on health services research
Normand, C
HEALTH ECONOMICS, 1998, 7 (03) : 280 - 280
[22] Subspace sums for extracting non-random data from massive noise
Denton, Anne M.
KNOWLEDGE AND INFORMATION SYSTEMS, 2009, 20 (01) : 35 - 62
[23] Subspace sums for extracting non-random data from massive noise
Anne M. Denton
Knowledge and Information Systems, 2009, 20 : 35 - 62
[24] Towards Semi-supervised Learning with Non-random Missing Labels
Duan, Yue
Zhao, Zhen
Qi, Lei
Zhou, Luping
Wang, Lei
Shi, Yinghuan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16075 - 16085
[25] Non-random decay of chordate characters causes bias in fossil interpretation
Sansom, Robert S.
Gabbott, Sarah E.
Purnell, Mark A.
NATURE, 2010, 463 (7282) : 797 - 800
[26] Non-random decay of chordate characters causes bias in fossil interpretation
Robert S. Sansom
Sarah E. Gabbott
Mark A. Purnell
Nature, 2010, 463 : 797 - 800
[27] 1-D random landscapes and non-random data series
Fink, T. M. A.
Willbrand, K.
Brown, F. C. S.
EPL, 2007, 79 (03)
[28] Radical instability and spurious branch support by likelihood when applied to matrices with non-random distributions of missing data
Simmons, Mark P.
MOLECULAR PHYLOGENETICS AND EVOLUTION, 2012, 62 (01) : 472 - 484
[29] Swarm Plot: Data Redistribution in Non-Random Technique
Idrus, Zainura
Rusli, Fatin S.
Idrus, Zanariah
Nazri, Muhammad Aqil Mohd
Al-zebari, Adel
Talib, Noor Hasnita Abdul
6TH IEEE INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE), 2021,
[30] Accuracy of single- and multiple-trait REML evaluation of data including non-random missing records
Persson, T
Andersson, B
SILVAE GENETICA, 2004, 53 (03) : 135 - 139

← 1 2 3 4 5 →