Addressing Bias from Non-Random Missing Attributes in Health Data

被引:0
|
作者
Napoli, Nicholas J. [1 ]
Kotoriy, Madeline E. [2 ]
Barnhardt, William [3 ]
Young, Jeffrey S. [4 ]
Barnes, Laura E. [1 ]
机构
[1] Univ Virginia, Syst & Informat Engn, Charlottesville, VA 22904 USA
[2] Univ Virginia, Batten Sch Leadership & Publ Policy, Charlottesville, VA 22904 USA
[3] Univ Virginia Hlth Syst, Emergency Serv, Charlottesville, VA 22904 USA
[4] Univ Virginia, Dept Surg, Charlottesville, VA 22908 USA
关键词
PERFORMANCE; TRAUMA;
D O I
暂无
中图分类号
R-058 [];
学科分类号
摘要
This paper aims to improve health outcomes research and data management practices. Typically health care records are very large and cumbersome to manage, and the quality of the data is often overlooked because the volume is thought to be large enough to overcome issues arising from missing data. However, simply removing observations with missing data is problematic because the distribution of missing information is non-random, thus the sample used for analysis becomes biased. We propose a method for evaluating and addressing bias in the data cleaning process. Specifically, we identify where bias exists within data and address the bias using sub-sampling or discarding data. We present a case study analyzing data from a level 1 trauma center to establish how bias in health registries exists and how this bias can have downstream implications for evaluating hospital performance. Our method utilizes a two-tailed z-test to compare subgroups in the data set, which demonstrates how missing data in these subgroups can lead to bias. We demonstrate how to localize the bias in particular subgroups and provide corrective actions to handle the bias. We also exhibit how failure to account for bias can distort performance, illustrating the importance of the proposed method.
引用
收藏
页码:265 / 268
页数:4
相关论文
共 50 条
  • [1] Correction of bias from non-random missing longitudinal data using auxiliary information
    Wang, Cuiling
    Hall, Charles B.
    [J]. STATISTICS IN MEDICINE, 2010, 29 (06) : 671 - 679
  • [2] Probabilistic Matrix Factorization with Non-random Missing Data
    Hernandez-Lobato, Jose Miguel
    Houlsby, Neil
    Ghahramani, Zoubin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1512 - 1520
  • [3] Estimating the Bias in Meta Analysis Estimates for Continuous Data With Non-Random Missing Study Variance
    Idris, Nik Ruzni Nik
    [J]. MATEMATIKA, 2011, 27 (02): : 121 - 128
  • [4] Bias correction models for electronic health records data in the presence of non-random sampling
    Kim, Jiyu
    Anthopolos, Rebecca
    Zhong, Judy
    [J]. BIOMETRICS, 2024, 80 (01)
  • [5] Collaborative Score Prediction Method for Non-Random Missing Data
    Gu W.
    Xie X.
    Zhang Z.
    Mao Y.
    Liang Z.
    He Y.
    [J]. Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2021, 49 (01): : 47 - 57
  • [6] Effect of non-random missing data mechanisms in clinical trials
    Choi, SC
    Lu, IL
    [J]. STATISTICS IN MEDICINE, 1995, 14 (24) : 2675 - 2684
  • [7] Recognition of speech with non-random attributes
    Burget, L
    Cernocky, J
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 257 - 262
  • [8] CounterCLR: Counterfactual Contrastive Learning with Non-random Missing Data in Recommendation
    Wang, Jun
    Li, Haoxuan
    Zhang, Chi
    Liang, Dongxu
    Yu, Enyun
    Ou, Wenwu
    Wang, Wenjia
    [J]. 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 1355 - 1360
  • [9] Parameter estimation in spatial econometric models with non-random missing data
    Seya, Hajime
    Tomari, Masashi
    Uno, Shohei
    [J]. APPLIED ECONOMICS LETTERS, 2021, 28 (06) : 440 - 446
  • [10] CHARACTERIZING AND COMPLETING NON-RANDOM MISSING VALUES
    Ben Othman, L.
    Rioult, P.
    Ben Yahia, S.
    Cremilleux, B.
    [J]. INTELLIGENT DECISION MAKING SYSTEMS, VOL. 2, 2010, : 227 - +