A Visual-Interactive Idiom to Diagnose Missing Data Mechanisms

被引:1
|
作者
do Amor Divino Lima, Rodrigo Santos [1 ]
Oliveira de Araujo, Tiago Davi [1 ]
Resque dos Santos, Carlos Gustavo [1 ]
Meiguins, Bianchi Serique [1 ]
机构
[1] Fed Univ Para, PPGCC, LABVIS, Belem, Para, Brazil
关键词
missing values; data preprocessing; exploratory data analysis; IMPUTATION; REGRESSION;
D O I
10.1109/IV51561.2020.00027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With vast amounts of data, comes vast numbers of problems. The process of collecting data is far from perfect, either due to human factors or technological errors, which can lead to inaccuracies and uncertainties in the data. One such issue is missing data: the absence of information. Several methods can deal with missing values, but to choose the correct approach, it is necessary to diagnose the missing data mechanisms, which describe how the distribution of missingness in a given data variable correlates to other variables. This diagnosis can be made with statistical tests or data visualization techniques. However, statistical tests provide an uncertainty estimation that is often misinterpreted, and the visualizations readily available in data analysis packages have some scalability issues, such as cognitive overload and lack of screen space. Thus, this paper proposes a visual-interactive idiom for diagnosing missing data mechanisms. The proposed solution consists of a set of visual encodings and two derived metrics that synthesizes the missing data mechanisms and the uncertainty associated with this synthesis. We present the concepts behind the visual encodings, derived metrics, and interactions of the idiom.
引用
收藏
页码:109 / 113
页数:5
相关论文
共 50 条
  • [21] Interactive Exploration of Data with Visual Metaphors
    Cybulski, Jacob L.
    Keller, Susan
    Saundage, Dilal
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2015, 25 (02) : 231 - 252
  • [22] Interactive Visual Classification of Multivariate Data
    Zhang, Ke-Bing
    Orgun, Mehmet A.
    Shankaran, Rajan
    Zhang, Du
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 246 - 251
  • [23] Interactive visual analysis of perfusion data
    Oeltze, Steffen
    Doleisch, Helmut
    Hauser, Helwig
    Muigg, Philipp
    Preim, Bernhard
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2007, 13 (06) : 1392 - 1399
  • [24] Interactive Visual Summarization of Multidimensional Data
    Kocherlakota, Sarat M.
    Healey, Christopher G.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 362 - +
  • [25] On Testability of Missing Data Mechanisms in Incomplete Data Sets
    Raykov, Tenko
    [J]. STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2011, 18 (03) : 419 - 429
  • [26] Missing data mechanisms and their implications on the analysis of categorical data
    Frederico Z. Poleto
    Julio M. Singer
    Carlos Daniel Paulino
    [J]. Statistics and Computing, 2011, 21 : 31 - 43
  • [27] Missing data mechanisms and their implications on the analysis of categorical data
    Poleto, Frederico Z.
    Singer, Julio M.
    Paulino, Carlos Daniel
    [J]. STATISTICS AND COMPUTING, 2011, 21 (01) : 31 - 43
  • [28] Missing data, part 2. Missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter
    Tra My Pham
    Pandis, Nikolaos
    White, Ian R.
    [J]. AMERICAN JOURNAL OF OPHTHALMOLOGY, 2022, 162 (01) : 138 - 139
  • [29] Missing data, part 2. Missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter
    Tra My Pham
    Pandis, Nikolaos
    White, Ian R.
    [J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2022, 162 (01) : 138 - 139
  • [30] Diagnose the mild cognitive impairment by constructing Bayesian network with missing data
    Sun, Yan
    Tang, Yiyuan
    Ding, Shuxue
    Lv, Shipin
    Cui, Yifen
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (01) : 442 - 449