FELIX: Automatic and Interpretable Feature Engineering Using LLMs

被引：0

作者：

Malberg, Simon ^{[1
]}

Mosca, Edoardo ^{[1
]}

Groh, Georg ^{[1
]}

机构：

[1] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT IV, ECML PKDD 2024 | 2024年 / 14944卷

关键词：

Large Language Models; Natural Language Processing; Feature Engineering; Text Classification;

D O I：

10.1007/978-3-031-70359-1_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-processing and feature engineering are essential yet labor-intensive components of NLP. Engineers must often balance the demand for high model accuracy against interpretability, all while having to deal with unstructured data. We address this issue by introducing Feature Engineering with LLMs for Interpretability and Explainability (FELIX), a novel approach harnessing the vast world knowledge embedded in pre-trained Large Language Models (LLMs) to automatically generate a set of features describing the data. These features are human-interpretable, bring structure to text samples, and can be easily leveraged to train downstream classifiers. We test FELIX across five different text classification tasks, showing that it performs better than feature extraction baselines such as TF-IDF and LLM's embeddings as well as s.o.t.a. LLM's zero-shot performance and a fine-tuned text classifier. Further experiments also showcase FELIX's strengths in terms of sample efficiency and generalization capabilities, making it a low-effort and reliable method for automatic and interpretable feature extraction. We release our code and supplementary material: https://github.com/ simonmalberg/felix.

引用

页码：230 / 246

页数：17

共 50 条

[31] SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks
Shi, Qitao
Zhang, Ya-Lin
Li, Longfei
Yang, Xinxing
Li, Meng
Zhou, Jun
2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1645 - 1656
[32] Automatic Feature Engineering for Time Series Classification: Evaluation and Discussion
Renault, Aurelien
Bondu, Alexis
Lemaire, Vincent
Gay, Dominique
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[33] Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis
Taniike, Toshiaki
Fujiwara, Aya
Nakanowatari, Sunao
Garcia-Escobar, Fernando
Takahashi, Keisuke
COMMUNICATIONS CHEMISTRY, 2024, 7 (01)
[34] FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches
Chatzimparmpas, Angelos
Martins, Rafael M.
Kucher, Kostiantyn
Kerren, Andreas
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (04) : 1773 - 1791
[35] Automatic Feature Engineering From Very High Dimensional Event Logs Using Deep Neural Networks
Hu, Kai
Wang, Joey
Liu, Yong
Chen, Datong
1ST INTERNATIONAL WORKSHOP ON DEEP LEARNING PRACTICE FOR HIGH-DIMENSIONAL SPARSE DATA WITH KDD (DLP-KDD 2019), 2019,
[36] Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis
Toshiaki Taniike
Aya Fujiwara
Sunao Nakanowatari
Fernando García-Escobar
Keisuke Takahashi
Communications Chemistry, 7
[37] LLMs in radiology through prompt engineering: Comment
Daungsupawong, Hinpetch
Wiwanitkit, Viroj
ROFO-FORTSCHRITTE AUF DEM GEBIET DER RONTGENSTRAHLEN UND DER BILDGEBENDEN VERFAHREN, 2025, 197 (01): : 76 - 76
[38] Interpretable Identification of Dynamic Adaptive Streaming over HTTP (DASH) Flows Based on Feature Engineering
Biernacki, Arkadiusz
APPLIED SCIENCES-BASEL, 2025, 15 (05):
[39] AUTOMATIC IDENTIFICATION of DIATOMS USING VISUAL HUMAN-INTERPRETABLE FEATURES
Fischer, Stefan
Bunke, Horst
International Journal of Image and Graphics, 2002, 2 (01) : 67 - 87
[40] Interpretable Automatic Detection of Incomplete Hippocampal Inversions Using Anatomical Criteria
Hemforth, Lisa J.
Cury, Claire
Frouin, Vincent
Sylvane, Desrivierese
Grigis, Antoine
Garavan, Hugh
Bruehl, Ruediger
Martinot, Jean-Luc
Martinoti, Marie-Laure Paillere
Artiges, Eric
Poustka, Luise
Hohmann, Sarah
Millenet, Sabina
Vaidya, Nilakshi
Walter, Henrik
Whelan, Robert
Schumann, Gunter
Couvy-Duchesne, Baptiste
Colliot, Olivier
MEDICAL IMAGING 2023, 2023, 12464

← 1 2 3 4 5 →