FELIX: Automatic and Interpretable Feature Engineering Using LLMs

被引：0

作者：

Malberg, Simon ^{[1
]}

Mosca, Edoardo ^{[1
]}

Groh, Georg ^{[1
]}

机构：

[1] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT IV, ECML PKDD 2024 | 2024年 / 14944卷

关键词：

Large Language Models; Natural Language Processing; Feature Engineering; Text Classification;

D O I：

10.1007/978-3-031-70359-1_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-processing and feature engineering are essential yet labor-intensive components of NLP. Engineers must often balance the demand for high model accuracy against interpretability, all while having to deal with unstructured data. We address this issue by introducing Feature Engineering with LLMs for Interpretability and Explainability (FELIX), a novel approach harnessing the vast world knowledge embedded in pre-trained Large Language Models (LLMs) to automatically generate a set of features describing the data. These features are human-interpretable, bring structure to text samples, and can be easily leveraged to train downstream classifiers. We test FELIX across five different text classification tasks, showing that it performs better than feature extraction baselines such as TF-IDF and LLM's embeddings as well as s.o.t.a. LLM's zero-shot performance and a fine-tuned text classifier. Further experiments also showcase FELIX's strengths in terms of sample efficiency and generalization capabilities, making it a low-effort and reliable method for automatic and interpretable feature extraction. We release our code and supplementary material: https://github.com/ simonmalberg/felix.

引用

页码：230 / 246

页数：17

共 50 条

[1] Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms
Tallon-Ballesteros, Antonio J.
Tuba, Milan
Xue, Bing
Hashimoto, Takako
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2018), PT II, 2018, 11315 : 280 - 287
[2] Automatic Feature Engineering Using Self-Organizing Maps
Rodrigues, Ericks da Silva
Martins, Denis Mayr Lima
de Lima Neto, Fernando Buarque
2021 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2021,
[3] Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned
Kim, Dongkwan
Kim, Eunsoo
Cha, Sang Kil
Son, Sooel
Kim, Yongdae
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 1661 - 1682
[4] Towards Automatic Complex Feature Engineering
Zhang, Jianyu
Fogelman-Soulie, Francoise
Largeron, Christine
WEB INFORMATION SYSTEMS ENGINEERING, WISE 2018, PT II, 2018, 11234 : 312 - 322
[5] Automatic feature engineering for drug discovery
Monteiro, M.
Lourenco, N.
Pereira, F.
EUROPEAN JOURNAL OF CLINICAL INVESTIGATION, 2021, 51 : 96 - 97
[6] VEST: automatic feature engineering for forecasting
Cerqueira, Vitor
Moniz, Nuno
Soares, Carlos
MACHINE LEARNING, 2024, 113 (07) : 4523 - 4545
[7] An interpretable automated feature engineering framework for improving logistic regression
Liu, Mucan
Guo, Chonghui
Xu, Liangchen
APPLIED SOFT COMPUTING, 2024, 153
[8] Breaking the Silence: the Threats of Using LLMs in Software Engineering
Sallou, June
Durieux, Thomas
Panichella, Annibale
2024 IEEE/ACM 46TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING RESULTS, ICSE-NIER 2024, 2024, : 102 - 106
[9] Learning Interpretable Style Embeddings via Prompting LLMs
Patel, Ajay
Rao, Delip
Kothary, Ansh
McKeown, Kathleen
Callison-Burch, Chris
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15270 - 15290
[10] Automatic feature recognition from engineering drawings
You, CF
Yang, SS
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 1998, 14 (07): : 495 - 507

← 1 2 3 4 5 →