Reducing Data Complexity Using Autoencoders With Class-Informed Loss Functions

被引：10

作者：

Charte, David ^{[1
]}

Charte, Francisco ^{[2
]}

Herrera, Francisco ^{[1
,3
]}

机构：

[1] Univ Granada, Comp Sci & AI Dept, Granada 18071, Spain

[2] Univ Jaen, Comp Sci Dept, Jaen 23071, Spain

[3] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah 21589, Saudi Arabia

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 12期

关键词：

Complexity theory; Feature extraction; Measurement; Shape; Support vector machines; Data models; Transforms; Autoencoders; dimension reduction; data complexity; DIMENSIONALITY REDUCTION; FEATURE-SELECTION;

D O I：

10.1109/TPAMI.2021.3127698

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Available data in machine learning applications is becoming increasingly complex, due to higher dimensionality and difficult classes. There exists a wide variety of approaches to measuring complexity of labeled data, according to class overlap, separability or boundary shapes, as well as group morphology. Many techniques can transform the data in order to find better features, but few focus on specifically reducing data complexity. Most data transformation methods mainly treat the dimensionality aspect, leaving aside the available information within class labels which can be useful when classes are somehow complex. This paper proposes an autoencoder-based approach to complexity reduction, using class labels in order to inform the loss function about the adequacy of the generated variables. This leads to three different new feature learners, Scorer, Skaler and Slicer. They are based on Fisher's discriminant ratio, the Kullback-Leibler divergence and least-squares support vector machines, respectively. They can be applied as a preprocessing stage for a binary classification problem. A thorough experimentation across a collection of 27 datasets and a range of complexity and classification metrics shows that class-informed autoencoders perform better than 4 other popular unsupervised feature extraction techniques, especially when the final objective is using the data for a classification task.

引用

页码：9549 / 9560

页数：12

共 34 条

[1] Reducing Dimensionality of Data Using Autoencoders
Janakiramaiah, B.
Kalyani, G.
Narayana, S.
Krishna, T. Bala Murali
[J]. SMART INTELLIGENT COMPUTING AND APPLICATIONS, VOL 2, 2020, 160 : 51 - 58
[2] Post-consultation acute respiratory tract infection recovery: a latent class-informed analysis of individual patient data
Hounkpatin, Hilda
Stuart, Beth
Zhu, Shihua
Yao, Guiqing
Moore, Michael
Loffler, Christin
Little, Paul
Kenealy, Timothy
Gillespie, David
Francis, Nick A.
Bostock, Jennifer
Becque, Taeko
Arroll, Bruce
Altiner, Attila
Alonso-Coello, Pablo
Hay, Alastair D.
[J]. BRITISH JOURNAL OF GENERAL PRACTICE, 2023, 73 (728): : E196 - E203
[3] Unsupervised fault detection in refrigeration showcase with single class data using autoencoders
Santana, Adamo
Kawamura, Yu
Murakami, Kenya
Iizaka, Tatsuya
Matsui, Tetsuro
Fukuyama, Yoshikazu
[J]. IEEJ Transactions on Electronics, Information and Systems, 2019, 139 (10) : 1191 - 1200
[4] Synthesizing Data Using Variational Autoencoders for Handling Class Imbalanced Deep Learning
Sheikh, Taimoor Shakeel
Khan, Adil
Fahim, Muhammad
Ahmad, Muhammad
[J]. ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS (AIST 2019), 2020, 1086 : 270 - 281
[5] Analysis of multisubject neuroimaging data using anatomically informed basis functions
Kiebel, SJ
Friston, KJ
[J]. NEUROIMAGE, 2001, 13 (06) : S172 - S172
[6] Audio-visual speech synthesis using vision transformer–enhanced autoencoders with ensemble of loss functions
Subhayu Ghosh
Snehashis Sarkar
Sovan Ghosh
Frank Zalkow
Nanda Dulal Jana
[J]. Applied Intelligence, 2024, 54 : 4507 - 4524
[7] Audio-visual speech synthesis using vision transformer-enhanced autoencoders with ensemble of loss functions
Ghosh, Subhayu
Sarkar, Snehashis
Ghosh, Sovan
Zalkow, Frank
Jana, Nanda Dulal
[J]. APPLIED INTELLIGENCE, 2024, 54 (06) : 4507 - 4524
[8] Steganography Technique to Prevent Data Loss by Using Boolean Functions
Dash, Satya Ranjan
Sen, Alo
Hassan, Sk. Sarif
Roy, Rahul
Misra, Chinmaya
Singh, Kamakhya Narain
[J]. ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2017, 2018, 668 : 263 - 270
[9] Parameter Identification for Constrained Data Using a New Class of Rational Fractal Functions
Katiyar, S. K.
Chand, A. K. B.
Jha, S.
[J]. NUMERICAL ANALYSIS AND APPLICATIONS, 2021, 14 (03) : 225 - 237
[10] Parameter Identification for Constrained Data Using a New Class of Rational Fractal Functions
S. K. Katiyar
A. K. B. Chand
S. Jha
[J]. Numerical Analysis and Applications, 2021, 14 : 225 - 237

← 1 2 3 4 →