FELIX: Automatic and Interpretable Feature Engineering Using LLMs

被引:0
|
作者
Malberg, Simon [1 ]
Mosca, Edoardo [1 ]
Groh, Georg [1 ]
机构
[1] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany
关键词
Large Language Models; Natural Language Processing; Feature Engineering; Text Classification;
D O I
10.1007/978-3-031-70359-1_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-processing and feature engineering are essential yet labor-intensive components of NLP. Engineers must often balance the demand for high model accuracy against interpretability, all while having to deal with unstructured data. We address this issue by introducing Feature Engineering with LLMs for Interpretability and Explainability (FELIX), a novel approach harnessing the vast world knowledge embedded in pre-trained Large Language Models (LLMs) to automatically generate a set of features describing the data. These features are human-interpretable, bring structure to text samples, and can be easily leveraged to train downstream classifiers. We test FELIX across five different text classification tasks, showing that it performs better than feature extraction baselines such as TF-IDF and LLM's embeddings as well as s.o.t.a. LLM's zero-shot performance and a fine-tuned text classifier. Further experiments also showcase FELIX's strengths in terms of sample efficiency and generalization capabilities, making it a low-effort and reliable method for automatic and interpretable feature extraction. We release our code and supplementary material: https://github.com/ simonmalberg/felix.
引用
收藏
页码:230 / 246
页数:17
相关论文
共 50 条
  • [41] A depthwise separable CNN-based interpretable feature extraction network for automatic pathological voice detection
    Zhao, Denghuang
    Qiu, Zhixin
    Jiang, Yujie
    Zhu, Xincheng
    Zhang, Xiaojun
    Tao, Zhi
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 88
  • [42] Breaking the Bias: Gender Fairness in LLMs Using Prompt Engineering and In-Context Learning
    Dwivedi, Satyam
    Ghosh, Sanjukta
    Dwivedi, Shivam
    RUPKATHA JOURNAL ON INTERDISCIPLINARY STUDIES IN HUMANITIES, 2023, 15 (04):
  • [43] Knowledge Distillation of LLMs for Automatic Scoring of Science Assessments
    Latif, Ehsan
    Fang, Luyang
    Ma, Ping
    Zhai, Xiaoming
    ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2024, 2024, 2151 : 166 - 174
  • [44] Research on automatic modeling technology based on solid feature for reverse engineering
    Li, Quanqing
    Li, Ming
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MECHANICAL ENGINEERING AND MECHANICS 2007, VOLS 1 AND 2, 2007, : 1653 - 1656
  • [45] Better Automatic Interpretation of Cement Evaluation Logs through Feature Engineering
    Viggen, Erlend Magnus
    Lovstakken, Lasse
    Masoy, Svein-Erik
    Merciu, Ioan Alexandru
    SPE JOURNAL, 2021, 26 (05): : 2894 - 2913
  • [46] An automatic approach of audio feature engineering for the extraction, analysis and selection of descriptors
    Marvin Jiménez
    Jose Aguilar
    Julin Monsalve-Pulido
    Edwin Montoya
    International Journal of Multimedia Information Retrieval, 2021, 10 : 33 - 42
  • [47] An automatic approach of audio feature engineering for the extraction, analysis and selection of descriptors
    Jimenez, Marvin
    Aguilar, Jose
    Monsalve-Pulido, Julin
    Montoya, Edwin
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2021, 10 (01) : 33 - 42
  • [48] DAFEE: A Scalable Distributed Automatic Feature Engineering Algorithm for Relational Datasets
    Zhao, Wenqian
    Li, Xiangxiang
    Rong, Guoping
    Lin, Mufeng
    Lin, Chen
    Yang, Yifan
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT II, 2020, 12453 : 32 - 46
  • [49] Automatic Feature Engineering for Prediction of Dangerous Seismic Activities in Coal Mines
    Zdravcvski, Eftim
    Lameski, Petre
    Kulakov, Andrea
    PROCEEDINGS OF THE 2016 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2016, 8 : 245 - 248
  • [50] Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
    Lee, Zne-Jung
    Lee, Chou-Yuan
    Chang, Li-Yun
    Sano, Natsuki
    SYMMETRY-BASEL, 2021, 13 (09):