DeepTLF: robust deep neural networks for heterogeneous tabular data

被引：7

作者：

Borisov, Vadim ^{[1
]}

Broelemann, Klaus ^{[2
]}

Kasneci, Enkelejda ^{[1
]}

Kasneci, Gjergji ^{[1
,2
]}

机构：

[1] Univ Tubingen, Tubingen, Germany

[2] SCHUFA Holding AG, Wiesbaden, Germany

来源：

INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS | 2023年 / 16卷 / 01期

关键词：

Deep neural networks; Heterogeneous data; Tabular data; Tabular data encoding; Multimodal learning;

D O I：

10.1007/s41060-022-00350-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although deep neural networks (DNNs) constitute the state of the art in many tasks based on visual, audio, or text data, their performance on heterogeneous, tabular data is typically inferior to that of decision tree ensembles. To bridge the gap between the difficulty of DNNs to handle tabular data and leverage the flexibility of deep learning under input heterogeneity, we propose DeepTLF, a framework for deep tabular learning. The core idea of our method is to transform the heterogeneous input data into homogeneous data to boost the performance of DNNs considerably. For the transformation step, we develop a novel knowledge distillations approach, TreeDrivenEncoder, which exploits the structure of decision trees trained on the available heterogeneous data to map the original input vectors onto homogeneous vectors that a DNN can use to improve the predictive performance. Within the proposed framework, we also address the issue of the multimodal learning, since it is challenging to apply decision tree ensemble methods when other data modalities are present. Through extensive and challenging experiments on various real-world datasets, we demonstrate that the DeepTLF pipeline leads to higher predictive performance. On average, our framework shows 19.6% performance improvement in comparison to DNNs. The DeepTLF code is publicly available.

引用

页码：85 / 100

页数：16

共 50 条

[1] DeepTLF: robust deep neural networks for heterogeneous tabular data
Vadim Borisov
Klaus Broelemann
Enkelejda Kasneci
Gjergji Kasneci
International Journal of Data Science and Analytics, 2023, 16 : 85 - 100
[2] Deep Neural Networks and Tabular Data: A Survey
Borisov, Vadim
Leemann, Tobias
Sessler, Kathrin
Haug, Johannes
Pawelczyk, Martin
Kasneci, Gjergji
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) : 7499 - 7519
[3] Interpretable Graph Neural Networks for Heterogeneous Tabular Data
Alkhatib, Amr
Bostrom, Henrik
DISCOVERY SCIENCE, DS 2024, PT I, 2025, 15243 : 310 - 324
[4] Removing Neurons From Deep Neural Networks Trained With Tabular Data
Klemetti, Antti
Raatikainen, Mikko
Kivimaki, Juhani
Myllyaho, Lalli
Nurminen, Jukka K.
IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2024, 5 : 542 - 552
[5] Investigating latent representations and generalization in deep neural networks for tabular data
Couplet, Edouard
Lambert, Pierre
Verleysen, Michel
Lee, John A.
de Bodt, Cyril
NEUROCOMPUTING, 2024, 597
[6] XGBoost-Enhanced Graph Neural Networks: A New Architecture for Heterogeneous Tabular Data
Yan, Liuxi
Xu, Yaoqun
APPLIED SCIENCES-BASEL, 2024, 14 (13):
[7] Publisher Correction: Converting tabular data into images for deep learning with convolutional neural networks
Yitan Zhu
Thomas Brettin
Fangfang Xia
Alexander Partin
Maulik Shukla
Hyunseung Yoo
Yvonne A. Evrard
James H. Doroshow
Rick L. Stevens
Scientific Reports, 11
[8] Robust dimensionality reduction for data visualization with deep neural networks
Becker, Martin
Lippel, Jens
Stuhlsatz, Andre
Zielke, Thomas
GRAPHICAL MODELS, 2020, 108
[9] Locally Sparse Neural Networks for Tabular Biomedical Data
Yang, Junchen
Lindenbaum, Ofir
Kluger, Yuval
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[10] Fuzzy Convolution Neural Networks for Tabular Data Classification
Kulkarni, Arun D.
IEEE ACCESS, 2024, 12 : 151846 - 151855

← 1 2 3 4 5 →