FELIX: Automatic and Interpretable Feature Engineering Using LLMs

被引:0
|
作者
Malberg, Simon [1 ]
Mosca, Edoardo [1 ]
Groh, Georg [1 ]
机构
[1] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany
关键词
Large Language Models; Natural Language Processing; Feature Engineering; Text Classification;
D O I
10.1007/978-3-031-70359-1_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-processing and feature engineering are essential yet labor-intensive components of NLP. Engineers must often balance the demand for high model accuracy against interpretability, all while having to deal with unstructured data. We address this issue by introducing Feature Engineering with LLMs for Interpretability and Explainability (FELIX), a novel approach harnessing the vast world knowledge embedded in pre-trained Large Language Models (LLMs) to automatically generate a set of features describing the data. These features are human-interpretable, bring structure to text samples, and can be easily leveraged to train downstream classifiers. We test FELIX across five different text classification tasks, showing that it performs better than feature extraction baselines such as TF-IDF and LLM's embeddings as well as s.o.t.a. LLM's zero-shot performance and a fine-tuned text classifier. Further experiments also showcase FELIX's strengths in terms of sample efficiency and generalization capabilities, making it a low-effort and reliable method for automatic and interpretable feature extraction. We release our code and supplementary material: https://github.com/ simonmalberg/felix.
引用
收藏
页码:230 / 246
页数:17
相关论文
共 50 条
  • [31] SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks
    Shi, Qitao
    Zhang, Ya-Lin
    Li, Longfei
    Yang, Xinxing
    Li, Meng
    Zhou, Jun
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1645 - 1656
  • [32] Automatic Feature Engineering for Time Series Classification: Evaluation and Discussion
    Renault, Aurelien
    Bondu, Alexis
    Lemaire, Vincent
    Gay, Dominique
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [33] Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis
    Taniike, Toshiaki
    Fujiwara, Aya
    Nakanowatari, Sunao
    Garcia-Escobar, Fernando
    Takahashi, Keisuke
    COMMUNICATIONS CHEMISTRY, 2024, 7 (01)
  • [34] FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches
    Chatzimparmpas, Angelos
    Martins, Rafael M.
    Kucher, Kostiantyn
    Kerren, Andreas
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (04) : 1773 - 1791
  • [35] Automatic Feature Engineering From Very High Dimensional Event Logs Using Deep Neural Networks
    Hu, Kai
    Wang, Joey
    Liu, Yong
    Chen, Datong
    1ST INTERNATIONAL WORKSHOP ON DEEP LEARNING PRACTICE FOR HIGH-DIMENSIONAL SPARSE DATA WITH KDD (DLP-KDD 2019), 2019,
  • [36] Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis
    Toshiaki Taniike
    Aya Fujiwara
    Sunao Nakanowatari
    Fernando García-Escobar
    Keisuke Takahashi
    Communications Chemistry, 7
  • [37] LLMs in radiology through prompt engineering: Comment
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    ROFO-FORTSCHRITTE AUF DEM GEBIET DER RONTGENSTRAHLEN UND DER BILDGEBENDEN VERFAHREN, 2025, 197 (01): : 76 - 76
  • [38] Interpretable Identification of Dynamic Adaptive Streaming over HTTP (DASH) Flows Based on Feature Engineering
    Biernacki, Arkadiusz
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [39] AUTOMATIC IDENTIFICATION of DIATOMS USING VISUAL HUMAN-INTERPRETABLE FEATURES
    Fischer, Stefan
    Bunke, Horst
    International Journal of Image and Graphics, 2002, 2 (01) : 67 - 87
  • [40] Interpretable Automatic Detection of Incomplete Hippocampal Inversions Using Anatomical Criteria
    Hemforth, Lisa J.
    Cury, Claire
    Frouin, Vincent
    Sylvane, Desrivierese
    Grigis, Antoine
    Garavan, Hugh
    Bruehl, Ruediger
    Martinot, Jean-Luc
    Martinoti, Marie-Laure Paillere
    Artiges, Eric
    Poustka, Luise
    Hohmann, Sarah
    Millenet, Sabina
    Vaidya, Nilakshi
    Walter, Henrik
    Whelan, Robert
    Schumann, Gunter
    Couvy-Duchesne, Baptiste
    Colliot, Olivier
    MEDICAL IMAGING 2023, 2023, 12464