Approximating scatterplots of large datasets using distribution splats

被引:0
|
作者
Camuto, M [1 ]
Crawfis, R [1 ]
Becker, B [1 ]
机构
[1] Ohio State Univ, Dept Comp & Informat Sci, Columbus, OH 43210 USA
来源
关键词
information visualization; categorical; nominal; splatting; volume rendering; relational data;
D O I
10.1117/12.378890
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many situations exist where the plotting of large data sets with categorical attributes is desired in a 3D coordinate system. For example, a marketing company may conduct a survey involving one million subjects and then plot peoples favorite car type against their weight, height and annual income. Scatter point plotting, in which each point is individually plotted at its corresponding cartesian location using a defined primitive, is usually used to render a plot of this type. If the dependent variable is continuous, eve can discretize the three-dimensional space into bins or voxels and retain the average value of all records falling within each voxel. Previous work employed volume rendering techniques, in particular, splatting, to represent this aggregated data, by mapping each average value to a representative color. However, if the color mapped attribute is categorical, then this technique is inadequate, since we can not represent or calculate an average value to which a single color is assigned. This paper presents a new technique called distribution splatting. Our method is not only faster than traditional scatter plotting for massive data sets, but also visually preserves the sample density within the volume. The main problem examined in this paper, is the representation of all the attributes at a voxel using a single splat. We represent a splat with a multi-colored tessellated hexagon. The number of elements, opacity and size within each splat can be modulated with respect to record size to show the sample density accurately. Other techniques such as percentage thresholding and an evidence viewing mode are presented as extensions. Examples from real data sets are presented to display the effectiveness of our method.
引用
收藏
页码:144 / 154
页数:11
相关论文
共 50 条
  • [1] Spatially simplified scatterplots for large raster datasets
    Li, Bin
    Griffith, Daniel A.
    Becker, Brian
    [J]. GEO-SPATIAL INFORMATION SCIENCE, 2016, 19 (02) : 81 - 93
  • [2] Fast approximating triangulation of large scattered datasets
    Weimer, H
    Warren, J
    [J]. ADVANCES IN ENGINEERING SOFTWARE, 1999, 30 (06) : 389 - 400
  • [3] Approximating the Interference Distribution in Large Wireless Networks
    Kountouris, Marios
    Pappas, Nikolaos
    [J]. 2014 11TH INTERNATIONAL SYMPOSIUM ON WIRELESS COMMUNICATIONS SYSTEMS (ISWCS), 2014, : 80 - 84
  • [4] Approximating a Distribution Using Weight Queries
    Barak, Nadav
    Sabato, Sivan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Using Large Datasets to Understand CKD
    Drysdale, Thomas A.
    [J]. JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2018, 29 (05): : 1351 - 1353
  • [6] Using Large Datasets to Understand Nanotechnology
    Paunovska, Kalina
    Loughrey, David
    Sago, Cory D.
    Langer, Robert
    Dahlman, James E.
    [J]. ADVANCED MATERIALS, 2019, 31 (43)
  • [7] Approximating conditional distribution functions using dimension reduction
    Hall, P
    Yao, QW
    [J]. ANNALS OF STATISTICS, 2005, 33 (03): : 1404 - 1421
  • [8] Measuring the Output Gap using Large Datasets
    Barigozzi, Matteo
    Luciani, Matteo
    [J]. REVIEW OF ECONOMICS AND STATISTICS, 2023, 105 (06) : 1500 - 1514
  • [9] CGLAD: Using GLAD in Crowdsourced Large Datasets
    Rodrigo, Enrique G.
    Aledo, Juan A.
    Gamez, Jose A.
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 783 - 791
  • [10] Mining co-distribution patterns for large crime datasets
    Phillips, Peter
    Lee, Ickjai
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (14) : 11556 - 11563