Clustering of mixed datasets using deep learning algorithm

被引:6
|
作者
Balaji, K. [1 ]
Lavanya, K. [1 ]
Mary, A. Geetha [1 ]
机构
[1] VIT Univ, Sch Comp Sci & Engn, Vellore, Tamil Nadu, India
关键词
Deep learning; Mixed data; Generative adversarial networks; Clustering loss; INFORMATION;
D O I
10.1016/j.chemolab.2020.104123
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance of a clustering algorithm is highly dependent on the quality and quantity of the training dataset. Deep learning is one of the most popular and successful technique for clustering of datasets with high quality. Typically, most of the datasets contain mixed numeric and categorical data attributes. The clustering of such different types of data is a complex issue. Deep learning methods, the state-of-the-art classifiers, with better learning procedures and computational resources, can fill these gaps. To improve the robustness of clusters, we propose a Constraint-Based Deep Convolutional Generative Adversarial Network (CB-DCGANs) framework for generating simulated data to augment the training set to improve the performance of the clustering algorithm. We evaluated the performance of an end-to-end Deep Convolutional Neural Network (DCNN) in detecting the clusters from given datasets. The results from CB-DCGANs with DCNN yielded baseline accuracies of 0.8853 for heart disease dataset. In chemoinformatics datasets proposed algorithm yielded accuracies of 0.965 for kaggle dataset, 0.987 for factors dataset, 0.952 for kinase dataset. This study shows that using generative adversarial networks for clustering augmentation can significantly improve performance, especially in real-life applications.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A Clustering Hybrid Algorithm for Smart Datasets using Machine Learning
    Amin, Dar Masroof
    Rai, Munishwar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (09) : 165 - 172
  • [2] Mixed Deep Gaussian Mixture Model: a clustering model for mixed datasets
    Robin Fuchs
    Denys Pommeret
    Cinzia Viroli
    Advances in Data Analysis and Classification, 2022, 16 : 31 - 53
  • [3] Mixed Deep Gaussian Mixture Model: a clustering model for mixed datasets
    Fuchs, Robin
    Pommeret, Denys
    Viroli, Cinzia
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2022, 16 (01) : 31 - 53
  • [4] Clustering Mixed Datasets by Using Similarity Features
    Ahmad, Amir
    Ray, Santosh Kumar
    Kumar, Ch Aswani
    SUSTAINABLE COMMUNICATION NETWORKS AND APPLICATION, ICSCN 2019, 2020, 39 : 478 - 485
  • [5] An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets
    Zhang, Kang
    Gu, Xingsheng
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [6] Plant Leaf Recognition Based on Small Datasets Using Deep Learning Algorithm
    Li, Jia-Xing
    Zhang, De-Xiang
    Zhang, Jing-Jing
    Zhang, Jun
    Xun, Li-Na
    Yan, Qing
    2016 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SECURITY (CSIS 2016), 2016, : 351 - 355
  • [7] Clustering Algorithm for Big Datasets with Mixed Attribute Features under Spark
    Wang, Jiankai
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [8] Random search with k-prototypes algorithm for clustering mixed datasets
    Pham, Duc-Truong
    Suarez-Alvarez, Maria M.
    Prostov, Yuriy I.
    PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2011, 467 (2132): : 2387 - 2403
  • [9] A parallel CF tree clustering algorithm for mixed-type datasets
    Li, Yufeng
    Xu, Keyi
    Ding, Yumei
    Sun, Zhiwei
    Ke, Ting
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (05) : 8309 - 8320
  • [10] K-Harmonic means type clustering algorithm for mixed datasets
    Ahmad, Amir
    Hashmi, Sarosh
    APPLIED SOFT COMPUTING, 2016, 48 : 39 - 49