Approximating scatterplots of large datasets using distribution splats

被引:0
|
作者
Camuto, M [1 ]
Crawfis, R [1 ]
Becker, B [1 ]
机构
[1] Ohio State Univ, Dept Comp & Informat Sci, Columbus, OH 43210 USA
来源
关键词
information visualization; categorical; nominal; splatting; volume rendering; relational data;
D O I
10.1117/12.378890
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many situations exist where the plotting of large data sets with categorical attributes is desired in a 3D coordinate system. For example, a marketing company may conduct a survey involving one million subjects and then plot peoples favorite car type against their weight, height and annual income. Scatter point plotting, in which each point is individually plotted at its corresponding cartesian location using a defined primitive, is usually used to render a plot of this type. If the dependent variable is continuous, eve can discretize the three-dimensional space into bins or voxels and retain the average value of all records falling within each voxel. Previous work employed volume rendering techniques, in particular, splatting, to represent this aggregated data, by mapping each average value to a representative color. However, if the color mapped attribute is categorical, then this technique is inadequate, since we can not represent or calculate an average value to which a single color is assigned. This paper presents a new technique called distribution splatting. Our method is not only faster than traditional scatter plotting for massive data sets, but also visually preserves the sample density within the volume. The main problem examined in this paper, is the representation of all the attributes at a voxel using a single splat. We represent a splat with a multi-colored tessellated hexagon. The number of elements, opacity and size within each splat can be modulated with respect to record size to show the sample density accurately. Other techniques such as percentage thresholding and an evidence viewing mode are presented as extensions. Examples from real data sets are presented to display the effectiveness of our method.
引用
收藏
页码:144 / 154
页数:11
相关论文
共 50 条
  • [41] Fitting spatial regressions to large datasets using unilateral approximations
    Arbia, Giuseppe
    Bee, Marco
    Espa, Giuseppe
    Santi, Flavio
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2018, 47 (01) : 222 - 238
  • [42] Extracting Functional Dependencies in Large Datasets Using MapReduce Model
    Amshakala, K.
    Nedunchezhian, R.
    Rajalakshmi, M.
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2014, 10 (03) : 19 - 35
  • [43] Quantum machine learning of large datasets using randomized measurements
    Haug, Tobias
    Self, Chris N.
    Kim, M. S.
    [J]. MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (01):
  • [44] Indexing Musical Sequences in Large Datasets Using Relational Databases
    Charapko, Aleksey
    Chuan, Ching-Hua
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2015, 6 (02): : 1 - 18
  • [45] Diagnosis of large inspection datasets using a adaptive, learning system
    Zöllner, JM
    Berns, K
    Dillmann, R
    [J]. MFI2001: INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS, 2001, : 31 - 36
  • [46] Analyzing Spatial Heterogeneity of Housing Prices Using Large Datasets
    Wu, Yangyi
    Wei, Yehua Dennis
    Li, Han
    [J]. APPLIED SPATIAL ANALYSIS AND POLICY, 2020, 13 (01) : 223 - 256
  • [47] Heterogeneous Large Datasets Integration Using Bayesian Factor Regression
    Avalos-Pacheco, Alejandra
    Rossell, David
    Savage, Richard S.
    [J]. BAYESIAN ANALYSIS, 2022, 17 (01): : 33 - 66
  • [48] Using OVA modeling to improve classification performance for large datasets
    Lutu, Patricia E. N.
    Engelbrecht, Andries P.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (04) : 4358 - 4376
  • [49] Clustering Large Datasets Using Data Stream Clustering Techniques
    Bolanos, Matthew
    Forrest, John
    Hahsler, Michael
    [J]. DATA ANALYSIS, MACHINE LEARNING AND KNOWLEDGE DISCOVERY, 2014, : 135 - 143
  • [50] Analyzing Spatial Heterogeneity of Housing Prices Using Large Datasets
    Yangyi Wu
    Yehua Dennis Wei
    Han Li
    [J]. Applied Spatial Analysis and Policy, 2020, 13 : 223 - 256