Automatic Machine Learning-Based OLAP Measure Detection for Tabular Data

被引:1
|
作者
Yang, Yuzhao [1 ]
Abdelhedi, Fatma [3 ]
Darmont, Jerome [2 ]
Ravat, Franck [1 ]
Teste, Olivier [1 ]
机构
[1] Univ Toulouse, IRIT CNRS UMR 5505, Toulouse, France
[2] Univ Lyon 2, UR ERIC, Univ Lyon, Lyon, France
[3] CBI2 TRIMANE, Paris, France
关键词
Data warehouses; OLAP; Measure detection; Tabular data; DATA WAREHOUSES; DESIGN;
D O I
10.1007/978-3-031-12670-3_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, it is difficult for companies and organisations without Business Intelligence (BI) experts to carry out data analyses. Existing automatic data warehouse design methods cannot treat with tabular data commonly defined without schema. Dimensions and hierarchies can still be deduced by detecting functional dependencies, but the detection of measures remains a challenge. To solve this issue, we propose a machine learning-based method to detect measures by defining three categories of features for numerical columns. The method is tested on real-world datasets and with various machine learning algorithms, concluding that random forest performs best for measure detection.
引用
收藏
页码:173 / 188
页数:16
相关论文
共 50 条
  • [31] Machine learning-based guilt detection in text
    Abdul Gafar Manuel Meque
    Nisar Hussain
    Grigori Sidorov
    Alexander Gelbukh
    [J]. Scientific Reports, 13
  • [32] Towards Machine Learning-based Anomaly Detection on Time-Series Data
    Vajda, Daniel
    Pekar, Adrian
    Farkas, Karoly
    [J]. INFOCOMMUNICATIONS JOURNAL, 2021, 13 (01): : 35 - 44
  • [33] Comparative Analysis of Machine Learning-Based Approaches for Anomaly Detection in Vehicular Data
    Demestichas, Konstantinos
    Alexakis, Theodoros
    Peppes, Nikolaos
    Adamopoulou, Evgenia
    [J]. VEHICLES, 2021, 3 (02): : 171 - 186
  • [34] Data Curation and Quality Evaluation for Machine Learning-Based Cyber Intrusion Detection
    Tran, Ngan
    Chen, Haihua
    Bhuyan, Jay
    Ding, Junhua
    [J]. IEEE ACCESS, 2022, 10 : 121900 - 121923
  • [35] Machine Learning-Based Unbalance Detection of a Rotating Shaft Using Vibration Data
    Mey, Oliver
    Neudeck, Willi
    Schneider, Andre
    Enge-Rosenblatt, Olaf
    [J]. 2020 25TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2020, : 1606 - 1613
  • [36] Data Processing and Model Selection for Machine Learning-based Network Intrusion Detection
    Sahu, Abhijeet
    Mao, Zeyu
    Davis, Katherine
    Goulart, Ana E.
    [J]. 2020 IEEE INTERNATIONAL WORKSHOP TECHNICAL COMMITTEE ON COMMUNICATIONS QUALITY AND RELIABILITY (CQR), 2020, : 49 - 54
  • [37] Application of tabular data synthesis using generative adversarial networks on machine learning-based multiaxial fatigue life prediction
    He, GaoYuan
    Zhao, YongXiang
    Yan, ChuLiang
    [J]. INTERNATIONAL JOURNAL OF PRESSURE VESSELS AND PIPING, 2022, 199
  • [38] A machine learning-based Anomaly Detection Framework for building electricity consumption data
    Mascali, Lorenzo
    Schiera, Daniele Salvatore
    Eiraudo, Simone
    Barbierato, Luca
    Giannantonio, Roberta
    Patti, Edoardo
    Bottaccioli, Lorenzo
    Lanzini, Andrea
    [J]. SUSTAINABLE ENERGY GRIDS & NETWORKS, 2023, 36
  • [39] Machine Learning-Based Artifact Detection for Long-Read Sequencing Data
    Mbuga, Felix
    Lam, Kathy
    Lee, Wendy
    [J]. 2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 582 - 584
  • [40] Machine Learning-Based Intrusion Detection System for Big Data Analytics in VANET
    Zang, Mingyuan
    Yan, Ying
    [J]. 2021 IEEE 93RD VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-SPRING), 2021,