Outlier Detection and Rejection in Scatterplots: Do Outliers Influence Intuitive Statistical Judgments?

被引:4
|
作者
Ciccione, Lorenzo [1 ,2 ]
Dehaene, Guillaume [3 ]
Dehaene, Stanislas [1 ,2 ]
机构
[1] Univ Paris Saclay, INSERM, NeuroSpin Ctr,CEA, UNICOG Cognit Neuroimaging Lab, NEUROSPLN Bat 145, F-91191 Gif Sur Yvette, France
[2] Univ Paris Sci Lettres PSL, Coll France, Paris, France
[3] Univ Geneva, Dept Neurosci Fondamentales, Geneva, Switzerland
关键词
graph perception; outliers; mental regression; attention; intuitive statistics; PERCEPTION; ATTENTION; PSYCHOLOGY;
D O I
10.1037/xhp0001065
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Public Significance Statement In all fields of science, it is quite common that a handful of observations depart from the rest of a dataset. In statistical jargon, such exceptional observations are known as outliers. In most cases, they should be ignored in order to focus the analysis on the typical case-or at least they should be analyzed separately. In our study, we tested whether human adults can perceptually detect and discard outliers when attempting to intuitively extract the trend from a scatterplot. We find that, spontaneously, adults do not reject outliers and are therefore strongly influenced by them in their statistical judgements. Furthermore, even when adults are told to detect and reject outliers, they continue to be biased by them. We propose guidelines for graphics designers to facilitate outlier detection and rejection. According to a growing body of research, human adults are remarkably accurate at extracting intuitive statistics from graphs, such as finding the best-fitting regression line through a scatterplot. Here, we ask whether humans can also perform outlier rejection, a nontrivial statistical problem. In three experiments, we investigated human adults' capacity to evaluate the linear trend of a flashed scatterplot comprising 0-4 outlier datapoints. Experiment 1 showed that participants did not spontaneously reject outliers: when outliers were not mentioned, their presence biased the participants' trend judgments and regression line estimates. In Experiment 2, where participants were explicitly asked to exclude outliers, the outlier-induced bias was reduced but remained significant. In Experiment 3, where participants were asked to explicitly detect any outlier before adjusting their regression line, outlier detection was satisfactory, but the detected outliers continued to bias the regression responses, unless they were quite distant from the main regression line. We propose a simple model for outlier detection, based on the computation of a z-score that estimates how far a given datapoint is from the distribution of distances to the regression line, and we show that this model closely approximates human performance. Detection is not rejection, however, and our results suggest that humans can remain biased by outliers that they have detected.
引用
收藏
页码:129 / 144
页数:16
相关论文
共 1 条