Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE

被引:8
|
作者
Cardona, Luis Ariosto Serna [1 ,2 ]
Vargas-Cardona, Hernan Dario [3 ]
Navarro Gonzalez, Piedad [2 ]
Cardenas Pena, David Augusto [1 ]
Orozco Gutierrez, Alvaro Angel [1 ]
机构
[1] Univ Tecnol Pereira, Dept Elect Engn, Pereira 660002, Colombia
[2] Corp Inst Adm Finanzas CIAF, Dept Engn, Pereira 660002, Colombia
[3] Pontificia Univ Javeriana Cali, Dept Elect & Comp Sci, Cali 760031, Colombia
关键词
Chi-square; classification; t-SNE; categorical data; dissimilarity; BEARING FAULT-DIAGNOSIS; DECOMPOSITION; SIMILARITY; ATTRIBUTE; ALGORITHM; DISTANCE;
D O I
10.3390/computation8040104
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The recurrent use of databases with categorical variables in different applications demands new alternatives to identify relevant patterns. Classification is an interesting approach for the recognition of this type of data. However, there are a few amount of methods for this purpose in the literature. Also, those techniques are specifically focused only on kernels, having accuracy problems and high computational cost. For this reason, we propose an identification approach for categorical variables using conventional classifiers (LDC-QDC-KNN-SVM) and different mapping techniques to increase the separability of classes. Specifically, we map the initial features (categorical attributes) to another space, using the Chi-square (C-S) as a measure of dissimilarity. Then, we employ the (t-SNE) for reducing dimensionality of data to two or three features, allowing a significant reduction of computational times in learning methods. We evaluate the performance of proposed approach in terms of accuracy for several experimental configurations and public categorical datasets downloaded from the UCI repository, and we compare with relevant state of the art methods. Results show that C-S mapping and t-SNE considerably diminish the computational times in recognitions tasks, while the accuracy is preserved. Also, when we apply only the C-S mapping to the datasets, the separability of classes is enhanced, thus, the performance of learning algorithms is clearly increased.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [1] A New Supervised t-SNE with Dissimilarity Measure for Effective Data Visualization and Classification
    Hajderanj, Laureta
    Weheliye, Isakh
    Chen, Daqing
    [J]. PROCEEDINGS OF 2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND INFORMATION ENGINEERING (ICSIE 2019), 2019, : 232 - 236
  • [2] A t-SNE Based Classification Approach to Compositional Microbiome Data
    Xu, Xueli
    Xie, Zhongming
    Yang, Zhenyu
    Li, Dongfang
    Xu, Ximing
    [J]. FRONTIERS IN GENETICS, 2020, 11
  • [3] A baseline covariate adjusted chi-square test for binary and categorical data
    Sebastien, B
    [J]. DRUG INFORMATION JOURNAL, 2001, 35 (01): : 145 - 151
  • [4] A Baseline Covariate Adjusted Chi-Square Test for Binary and Categorical Data
    Bernard Sebastien
    [J]. Drug information journal : DIJ / Drug Information Association, 2001, 35 (1): : 145 - 151
  • [5] BAN ESTIMATION FOR CHI-SQUARE TEST CRITERIA IN CATEGORICAL-DATA
    BEMIS, KG
    BHAPKAR, VP
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1983, 12 (11) : 1211 - 1223
  • [6] RANK TESTS FOR CATEGORICAL DATA AND THEIR RELATION TO CHI-SQUARE TEST FOR INDEPENDENCE
    HOBBS, G
    [J]. BIOMETRICS, 1975, 31 (02) : 594 - 594
  • [7] Decomposition and Classification of Stellar Spectra Based on t-SNE
    Jiang Bin
    Zhao Zi-liang
    Wang Shu-ting
    Wei Ji-yu
    Qu Mei-xia
    [J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2020, 40 (09) : 2913 - 2917
  • [8] Visualizing Data using t-SNE
    van der Maaten, Laurens
    Hinton, Geoffrey
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 2579 - 2605
  • [9] A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response
    He, Hanji
    He, Jianfeng
    Deng, Guangming
    [J]. JOURNAL OF MATHEMATICS, 2024, 2024
  • [10] Induction Motor Fault Classification Based on ROC Curve and t-SNE
    Lee, Chun-Yao
    Lin, Wen-Cheng
    [J]. IEEE ACCESS, 2021, 9 : 56330 - 56343