DeepTLF: robust deep neural networks for heterogeneous tabular data

被引:7
|
作者
Borisov, Vadim [1 ]
Broelemann, Klaus [2 ]
Kasneci, Enkelejda [1 ]
Kasneci, Gjergji [1 ,2 ]
机构
[1] Univ Tubingen, Tubingen, Germany
[2] SCHUFA Holding AG, Wiesbaden, Germany
关键词
Deep neural networks; Heterogeneous data; Tabular data; Tabular data encoding; Multimodal learning;
D O I
10.1007/s41060-022-00350-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although deep neural networks (DNNs) constitute the state of the art in many tasks based on visual, audio, or text data, their performance on heterogeneous, tabular data is typically inferior to that of decision tree ensembles. To bridge the gap between the difficulty of DNNs to handle tabular data and leverage the flexibility of deep learning under input heterogeneity, we propose DeepTLF, a framework for deep tabular learning. The core idea of our method is to transform the heterogeneous input data into homogeneous data to boost the performance of DNNs considerably. For the transformation step, we develop a novel knowledge distillations approach, TreeDrivenEncoder, which exploits the structure of decision trees trained on the available heterogeneous data to map the original input vectors onto homogeneous vectors that a DNN can use to improve the predictive performance. Within the proposed framework, we also address the issue of the multimodal learning, since it is challenging to apply decision tree ensemble methods when other data modalities are present. Through extensive and challenging experiments on various real-world datasets, we demonstrate that the DeepTLF pipeline leads to higher predictive performance. On average, our framework shows 19.6% performance improvement in comparison to DNNs. The DeepTLF code is publicly available.
引用
收藏
页码:85 / 100
页数:16
相关论文
共 50 条
  • [1] DeepTLF: robust deep neural networks for heterogeneous tabular data
    Vadim Borisov
    Klaus Broelemann
    Enkelejda Kasneci
    Gjergji Kasneci
    International Journal of Data Science and Analytics, 2023, 16 : 85 - 100
  • [2] Deep Neural Networks and Tabular Data: A Survey
    Borisov, Vadim
    Leemann, Tobias
    Sessler, Kathrin
    Haug, Johannes
    Pawelczyk, Martin
    Kasneci, Gjergji
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) : 7499 - 7519
  • [3] Interpretable Graph Neural Networks for Heterogeneous Tabular Data
    Alkhatib, Amr
    Bostrom, Henrik
    DISCOVERY SCIENCE, DS 2024, PT I, 2025, 15243 : 310 - 324
  • [4] Removing Neurons From Deep Neural Networks Trained With Tabular Data
    Klemetti, Antti
    Raatikainen, Mikko
    Kivimaki, Juhani
    Myllyaho, Lalli
    Nurminen, Jukka K.
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2024, 5 : 542 - 552
  • [5] Investigating latent representations and generalization in deep neural networks for tabular data
    Couplet, Edouard
    Lambert, Pierre
    Verleysen, Michel
    Lee, John A.
    de Bodt, Cyril
    NEUROCOMPUTING, 2024, 597
  • [6] XGBoost-Enhanced Graph Neural Networks: A New Architecture for Heterogeneous Tabular Data
    Yan, Liuxi
    Xu, Yaoqun
    APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [7] Publisher Correction: Converting tabular data into images for deep learning with convolutional neural networks
    Yitan Zhu
    Thomas Brettin
    Fangfang Xia
    Alexander Partin
    Maulik Shukla
    Hyunseung Yoo
    Yvonne A. Evrard
    James H. Doroshow
    Rick L. Stevens
    Scientific Reports, 11
  • [8] Robust dimensionality reduction for data visualization with deep neural networks
    Becker, Martin
    Lippel, Jens
    Stuhlsatz, Andre
    Zielke, Thomas
    GRAPHICAL MODELS, 2020, 108
  • [9] Locally Sparse Neural Networks for Tabular Biomedical Data
    Yang, Junchen
    Lindenbaum, Ofir
    Kluger, Yuval
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Fuzzy Convolution Neural Networks for Tabular Data Classification
    Kulkarni, Arun D.
    IEEE ACCESS, 2024, 12 : 151846 - 151855