Visualizing Profiles of Large Datasets of Weighted and Mixed Data

被引:4
|
作者
Grane, Aurea [1 ]
Sow-Barry, Alpha A. [1 ]
机构
[1] Univ Carlos III Madrid, Dept Stat, Getafe 28903, Spain
关键词
clustering; Gower’ s interpolation formula; s metric; mixed data; multidimensional scaling; ALGORITHM; HEALTH; EUROPE;
D O I
10.3390/math9080891
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k-prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k-prototypes algorithm and mapped via classical MDS. Gower's interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower's distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.
引用
下载
收藏
页数:20
相关论文
共 50 条
  • [21] A Data-Driven Approach for Extracting Representative Information From Large Datasets With Mixed Attributes
    Wu, Feng
    Huang, Xin
    Jiang, Bin
    IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 2022, 69 (05) : 1806 - 1822
  • [22] A Space-Time GIS for Visualizing and Analyzing Clusters in Large Tracking Datasets
    Yu, Hongbo
    SPATIAL DATA HANDLING IN BIG DATA ERA, 2017, : 93 - 109
  • [23] Visualizing and Interpreting Single-Cell Gene Expression Datasets with Similarity Weighted Nonnegative Embedding
    Wu, Yan
    Tamayo, Pablo
    Zhang, Kun
    CELL SYSTEMS, 2018, 7 (06) : 656 - +
  • [24] Visualizing and Interpreting Large Sensor Datasets for Daily Specialty Crop Management Decisions
    Lea-Cox, John D.
    HORTSCIENCE, 2011, 46 (09) : S76 - S76
  • [25] Quantitatively Visualizing Bipartite Datasets
    Einav, Tal
    Khoo, Yuehaw
    Singer, Amit
    PHYSICAL REVIEW X, 2023, 13 (02)
  • [26] Adapted methods for clustering large datasets of mixed units
    IMFM Ljubljana, Dept. of TCS, Jadranska 19, 1000 Ljubljana, Slovenia
    Inf, 4 (507-511):
  • [27] Visualizing Large Scale Scientific Data Provenance
    Chen, Peng
    Plale, Beth
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1388 - 1388
  • [28] Data Abstraction for Visualizing Large Time Series
    Shurkhovetskyy, G.
    Andrienko, N.
    Andrienko, G.
    Fuchs, G.
    COMPUTER GRAPHICS FORUM, 2018, 37 (01) : 125 - 144
  • [29] Visualizing and segmenting large volumetric data sets
    Senger, Steven
    IEEE Computer Graphics and Applications, 19 (03): : 32 - 37
  • [30] VISUALIZING LARGE DATA SETS IN THE EARTH SCIENCES
    HIBBARD, W
    SANTEK, D
    COMPUTER, 1989, 22 (08) : 53 - 57