Visualizing Profiles of Large Datasets of Weighted and Mixed Data

被引:4
|
作者
Grane, Aurea [1 ]
Sow-Barry, Alpha A. [1 ]
机构
[1] Univ Carlos III Madrid, Dept Stat, Getafe 28903, Spain
关键词
clustering; Gower’ s interpolation formula; s metric; mixed data; multidimensional scaling; ALGORITHM; HEALTH; EUROPE;
D O I
10.3390/math9080891
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k-prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k-prototypes algorithm and mapped via classical MDS. Gower's interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower's distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.
引用
下载
收藏
页数:20
相关论文
共 50 条
  • [31] Visualizing program executions on large data set
    Stasko, JT
    Muthukumarasamy, J
    IEEE SYMPOSIUM ON VISUAL LANGUAGES, PROCEEDINGS, 1996, : 166 - 173
  • [32] Visualizing Large Scale Scientific Data Provenance
    Chen, Peng
    Plale, Beth
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1385 - 1386
  • [33] Visualizing and segmenting large volumetric data sets
    Senger, S
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 1999, 19 (03) : 32 - 37
  • [34] Visualizing Large Scale Scientific Data Provenance
    Chen, Peng
    Plale, Beth
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1387 - 1387
  • [35] Scalable Distributed Data Anonymization for Large Datasets
    di Vimercati, Sabrina De Capitani
    Facchinetti, Dario
    Foresti, Sara
    Livraga, Giovanni
    Oldani, Gianluca
    Paraboschi, Stefano
    Rossi, Matthew
    Samarati, Pierangela
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (03) : 818 - 831
  • [36] Parallel volume rendering with early ray termination for visualizing large-scale datasets
    Matsui, M
    Ino, F
    Hagihara, K
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2004, 3358 : 245 - 256
  • [37] Visualizing large, high-throughput datasets based on the cognitive representation of biological pathways
    Nagel, Axel
    Lohse, Marc
    Bolger, Anthony
    Stitt, Mark
    Usadel, Bjoern
    HUMAN VISION AND ELECTRONIC IMAGING XVI, 2011, 7865
  • [38] A Case Study on Visualizing Large Spatial Datasets in a Web-Based Map Viewer
    Cortinas, Alejandro
    Luaces, Miguel R.
    Rodeiro, Tirso, V
    WEB ENGINEERING, ICWE 2018, 2018, 10845 : 294 - 301
  • [39] A System for Visualizing Spatiotemporal Urban Datasets
    Hertel, Stefan
    Wagner, Matthias
    Westermann, Ruediger
    2009 17TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS, VOLS 1 AND 2, 2009, : 260 - 265
  • [40] omicplotR: visualizing omic datasets as compositions
    Giguere, Daniel J.
    Macklaim, Jean M.
    Lieng, Brandon Y.
    Gloor, Gregory B.
    BMC BIOINFORMATICS, 2019, 20 (01)