Best Practices for Binary and Ordinal Data Analyses

被引:34
|
作者
Verhulst, Brad [1 ]
Neale, Michael C. [2 ]
机构
[1] Texas A&M Univ, Dept Psychiat & Behav Hlth, College Stn, TX 77843 USA
[2] Virginia Commonwealth Univ, Virginia Inst Psychiat & Behav Genet, Richmond, VA 23284 USA
关键词
Ordinal data; Pearson product-moment correlation; Polychoric correlation; Point biserial correlation; Tetrachoric correlation; Odds ratio; Prevalence; STRUCTURAL EQUATION; CONCORDANCE; LIABILITY; PACKAGE;
D O I
10.1007/s10519-020-10031-x
中图分类号
B84 [心理学]; C [社会科学总论]; Q98 [人类学];
学科分类号
03 ; 0303 ; 030303 ; 04 ; 0402 ;
摘要
The measurement of many human traits, states, and disorders begins with a set of items on a questionnaire. The response format for these questions is often simply binary (e.g., yes/no) or ordered (e.g., high, medium or low). During data analysis, these items are frequently summed or used to estimate factor scores. In clinical applications, such assessments are often non-normally distributed in the general population because many respondents are unaffected, and therefore asymptomatic. As a result, in many cases these measures violate the statistical assumptions required for subsequent analyses. To reduce the influence of the non-normality and quasi-continuous assessment, variables are frequently recoded into binary (affected-unaffected) or ordinal (mild-moderate-severe) diagnoses. Ordinal data therefore present challenges at multiple levels of analysis. Categorizing continuous variables into ordered categories typically results in a loss of statistical power, which represents an incentive to the data analyst to assume that the data are normally distributed, even when they are not. Despite prior zeitgeists suggesting that, e.g., variables with more than 10 ordered categories may be regarded as continuous and analyzed as if they were, we show via simulation studies that this is not generally the case. In particular, using Pearson product-moment correlations instead of maximum likelihood estimates of polychoric correlations biases the estimated correlations towards zero. This bias is especially severe when a plurality of the observations fall into a single observed category, such as a score of zero. By contrast, estimating the ordinal correlation by maximum likelihood yields no estimation bias, although standard errors are (appropriately) larger. We also illustrate how odds ratios depend critically on the proportion or prevalence of affected individuals in the population, and therefore are sub-optimal for studies where comparisons of association metrics are needed. Finally, we extend these analyses to the classical twin model and demonstrate that treating binary data as continuous will underestimate genetic and common environmental variance components, and overestimate unique environment (residual) variance. These biases increase as prevalence declines. While modeling ordinal data appropriately may be more computationally intensive and time consuming, failing to do so will likely yield biased correlations and biased parameter estimates from modeling them.
引用
收藏
页码:204 / 214
页数:11
相关论文
共 50 条
  • [41] Best Practices for Data Publication in the Astronomical Literature
    Chen, Tracy X.
    Schmitz, Marion
    Mazzarella, Joseph M.
    Wu, Xiuqin
    van Eyken, Julian C.
    Accomazzi, Alberto
    Akeson, Rachel L.
    Allen, Mark
    Beaton, Rachael
    Bruce Berriman, G.
    Boyle, Andrew W.
    Brouty, Marianne
    Chan, Ben H.P.
    Christiansen, Jessie L.
    Ciardi, David R.
    Cook, David
    D'Abrusco, Raffaele
    Ebert, Rick
    Frayer, Cren
    Fulton, Benjamin J.
    Gelino, Christopher
    Helou, George
    Henderson, Calen B.
    Howell, Justin
    Kim, Joyce
    Landais, Gilles
    Lo, Tak
    Loup, Cecile
    Madore, Barry
    Monari, Giacomo
    Muench, August
    Oberto, Anais
    Ocvirk, Pierre
    Peek, Joshua E.G.
    Perret, Emmanuelle
    Pevunova, Olga
    Ramirez, Solange V.
    Rebull, Luisa
    Shemmer, Ohad
    Smale, Alan
    Tam, Raymond
    Terek, Scott
    Van Orsow, Doug
    Vannier, Patricia
    Wang, Shin-Ywan
    arXiv, 2021,
  • [42] Best Practices for Leveraging Data Analytics in Procurement
    Shao, Benjamin B. M.
    St Louis, Robert D.
    Corral, Karen
    Li, Ziru
    MIS QUARTERLY EXECUTIVE, 2022, 21 (02) : 131 - 142
  • [43] Best Practices for Data Publication in the Astronomical Literature
    Chen, Tracy X.
    Schmitz, Marion
    Mazzarella, Joseph M.
    Wu, Xiuqin
    van Eyken, Julian C.
    Accomazzi, Alberto
    Akeson, Rachel L.
    Allen, Mark
    Beaton, Rachael
    Berriman, G. Bruce
    Boyle, Andrew W.
    Brouty, Marianne
    Chan, Ben H. P.
    Christiansen, Jessie L.
    Ciardi, David R.
    Cook, David
    D'Abrusco, Raffaele
    Ebert, Rick
    Frayer, Cren
    Fulton, Benjamin J.
    Gelino, Christopher
    Helou, George
    Henderson, Calen B.
    Howell, Justin
    Kim, Joyce
    Landais, Gilles
    Lo, Tak
    Loup, Cecile
    Madore, Barry
    Monari, Giacomo
    Muench, August
    Oberto, Anais
    Ocvirk, Pierre
    Peek, Joshua E. G.
    Perret, Emmanuelle
    Pevunova, Olga
    Ramirez, Solange, V
    Rebull, Luisa
    Shemmer, Ohad
    Smale, Alan
    Tam, Raymond
    Terek, Scott
    Van Orsow, Doug
    Vannier, Patricia
    Wang, Shin-Ywan
    ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2022, 260 (01):
  • [44] Best practices for genetic and genomic data archiving
    Leigh, Deborah M.
    Vandergast, Amy G.
    Hunter, Margaret E.
    Crandall, Eric D.
    Funk, W. Chris
    Garroway, Colin J.
    Hoban, Sean
    Oyler-McCance, Sara J.
    Rellstab, Christian
    Segelbacher, Gernot
    Schmidt, Chloe
    Vazquez-Dominguez, Ella
    Paz-Vinas, Ivan
    NATURE ECOLOGY & EVOLUTION, 2024, 8 (07): : 1224 - 1232
  • [45] Webinar: Best practices for data policies and standards
    IBM Data Management Magazine, 2013, (08):
  • [46] 5 Best practices for test data management
    Madia, K., 1600, CMP Asia Ltd.- New York Office
  • [47] Best Practices in Structuring Data Science Projects
    Rybicki, Jedrzej
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2018, PT III, 2019, 854 : 348 - 357
  • [48] Best practices for reporting safety data to data monitoring committees
    Davis, Sonia
    Sun, Hengrui
    Jung, Kwanhye
    TRIALS, 2017, 18
  • [49] A Look at Inorganic Analyses, Proficiency Tests, and Best Practices, Part II
    Atkins, Patricia L.
    Stainback, Lauren
    SPECTROSCOPY, 2022, 37 (07) : 7 - 11
  • [50] Latent variables, measurement error and methods for analysing longitudinal binary and ordinal data
    Palta, M
    Lin, CY
    STATISTICS IN MEDICINE, 1999, 18 (04) : 385 - 396