FIF: A NLP-based Feature Identification Framework for Data Warehouses

被引:0
|
作者
Chouhan, Ashish [1 ]
Prabhune, Ajinkya [1 ]
机构
[1] SRH Univ Heidelberg, Heidelberg, Germany
关键词
feature selection; data warehouses; topic modeling; data mining; microservices; dimensionality space reduction; model selection; CLASSIFICATION; SELECTION;
D O I
10.1145/3350546.3352530
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a data warehouse, selecting the relevant features is an iterative process that is laborious, time-consuming, and error-prone due to selection bias introduced by either the data expert or the data-analyst. In order to address this challenge, this paper introduces FIF, a Feature Identification Framework that uses Natural Language Processing (NLP) to analyze the hypotheses, identify the relevant feature space and predict the appropriate data mining task and model. The FIF is designed on the principles of microservices architecture pattern, comprising of five core groups of microservices: (a) NLP Pre-processor, (b) Attribute Identifier, (c) Feature Identifier, (d) Topic Modeller, and (e) Data Mining Task Evaluator. Finally, FIF is evaluated with five hypotheses against our data warehouse.
引用
收藏
页码:276 / 281
页数:6
相关论文
共 50 条
  • [1] An NLP-Based Framework to Spot Extremist Networks in Social Media
    Zapata Rozo, Andres
    Diaz-Lopez, Daniel
    Pastor-Galindo, Javier
    Gomez Marmol, Felix
    Karabiyik, Umit
    [J]. COMPLEXITY, 2024, 2024
  • [2] Semantic Search and NLP-Based Diagnostics
    Kats, Yefim
    [J]. 2014 IEEE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2014, : 277 - 280
  • [3] An NLP-based Question Answering Framework for Spatio-Temporal Analysis and Visualization
    Yin, Zhengcong
    Zhang, Chong
    Goldberg, Daniel W.
    Prasad, Sathya
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON GEOINFORMATICS AND DATA ANALYSIS (ICGDA 2019), 2019, : 61 - 65
  • [4] Practical NLP-based text indexing
    Vilares, J
    Barcala, FM
    Alonso, MA
    Graña, J
    Vilares, M
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2002, PROCEEDINGS, 2002, 2527 : 635 - 644
  • [5] An NLP-based Cognitive System for Disease Status Identification in Electronic Health Records
    Alemzadeh, Homa
    Devarakonda, Murthy
    [J]. 2017 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL & HEALTH INFORMATICS (BHI), 2017, : 89 - 92
  • [6] NLP-based curation of bacterial regulatory networks
    Rodriguez-Penagos, Carlos
    Salgado, Heladia
    Martinez-Flores, Irma
    Collado-Vides, Julio
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 575 - +
  • [7] NLP-based music processing for composer classification
    Deepaisarn, Somrudee
    Chokphantavee, Sirawit
    Chokphantavee, Sorawit
    Prathipasen, Phuriphan
    Buaruk, Suphachok
    Sornlertlamvanich, Virach
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [8] NLP-Based Automated Compliance Checking of Data Processing Agreements Against GDPR
    Cejas, Orlando Amaral
    Azeem, Muhammad Ilyas
    Abualhaija, Sallam
    Briand, Lionel C.
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (09) : 4282 - 4303
  • [9] Overview of Todai Robot Project and Evaluation Framework of its NLP-based Problem Solving
    Fujita, Akira
    Kameda, Akihiro
    Kawazoe, Ai
    Miyao, Yusuke
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2590 - 2597
  • [10] NLP-Based Approach to Semantic Classification of Heterogeneous Transportation Asset Data Terminology
    Le, Tuyen
    Jeong, H. David
    [J]. JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2017, 31 (06)