Visualizing Profiles of Large Datasets of Weighted and Mixed Data

被引:4
|
作者
Grane, Aurea [1 ]
Sow-Barry, Alpha A. [1 ]
机构
[1] Univ Carlos III Madrid, Dept Stat, Getafe 28903, Spain
关键词
clustering; Gower’ s interpolation formula; s metric; mixed data; multidimensional scaling; ALGORITHM; HEALTH; EUROPE;
D O I
10.3390/math9080891
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k-prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k-prototypes algorithm and mapped via classical MDS. Gower's interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower's distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.
引用
下载
收藏
页数:20
相关论文
共 50 条
  • [41] Scalable algorithms for clustering large datasets with mixed type attributes
    He, ZY
    Xu, XF
    Deng, SC
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2005, 20 (10) : 1077 - 1089
  • [42] A Mixed Learning Strategy for Finding Typical Testors in Large Datasets
    Ivan Gonzalez-Guevara, Victor
    Godoy-Calderon, Salvador
    Alba-Cabrera, Eduardo
    Ibarra-Fiallo, Julio
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 716 - 723
  • [43] omicplotR: visualizing omic datasets as compositions
    Daniel J. Giguere
    Jean M. Macklaim
    Brandon Y. Lieng
    Gregory B. Gloor
    BMC Bioinformatics, 20
  • [44] Classifying Call Profiles in Large-scale Mobile Traffic Datasets
    Naboulsi, Diala
    Stanica, Razvan
    Fiore, Marco
    2014 PROCEEDINGS IEEE INFOCOM, 2014, : 1806 - 1814
  • [45] Visualizing large-scale uncertainty in astrophysical data
    Li, Hongwei
    Fu, Chi-Wing
    Li, Yinggang
    Hanson, Andrew J.
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2007, 13 (06) : 1640 - 1647
  • [46] Visualizing the Big (and Large) Data from an HPC Resource
    Sisneros, Robert
    NUMERICAL MODELING OF SPACE PLASMA FLOWS: ASTRONUM-2014, 2015, 498 : 240 - 249
  • [47] Spatial metaphors for visualizing very large data archives
    Skupin, A
    Buttenfield, BP
    GIS/LIS '96 - ANNUAL CONFERENCE AND EXPOSITION PROCEEDINGS, 1996, : 607 - 617
  • [48] Visualizing Large Scale Patterns and Anomalies in Geospatial Data
    Stark, Robert F.
    Wollocko, Arthur
    Borys, Michael
    Kierstead, Megan
    Farry, Michael
    2012 IEEE CONFERENCE ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY (VAST), 2012, : 271 - 272
  • [49] SMapper: visualizing spatial prevalence data of all types, including sparse and incomplete datasets
    Khellaf, Lynn
    Ralf, Arwin
    Nguyen, Khanh Toan
    Kayser, Manfred
    Nothnagel, Michael
    BIOINFORMATICS ADVANCES, 2023, 3 (01):
  • [50] WEIGHTED CLUSTERING ALGORITHM FOR THE DATA WITH MIXED ATTRIBUTES
    Wang, Yiyang
    Wang, Li
    Qian, Zhong
    Xu, Bo
    Lei, Chao
    Zhong, Yao
    You, Yue
    ICIM2014: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON INDUSTRIAL MANAGEMENT, 2014, : 333 - 338