Automatic Machine Learning-Based OLAP Measure Detection for Tabular Data

被引:1
|
作者
Yang, Yuzhao [1 ]
Abdelhedi, Fatma [3 ]
Darmont, Jerome [2 ]
Ravat, Franck [1 ]
Teste, Olivier [1 ]
机构
[1] Univ Toulouse, IRIT CNRS UMR 5505, Toulouse, France
[2] Univ Lyon 2, UR ERIC, Univ Lyon, Lyon, France
[3] CBI2 TRIMANE, Paris, France
关键词
Data warehouses; OLAP; Measure detection; Tabular data; DATA WAREHOUSES; DESIGN;
D O I
10.1007/978-3-031-12670-3_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, it is difficult for companies and organisations without Business Intelligence (BI) experts to carry out data analyses. Existing automatic data warehouse design methods cannot treat with tabular data commonly defined without schema. Dimensions and hierarchies can still be deduced by detecting functional dependencies, but the detection of measures remains a challenge. To solve this issue, we propose a machine learning-based method to detect measures by defining three categories of features for numerical columns. The method is tested on real-world datasets and with various machine learning algorithms, concluding that random forest performs best for measure detection.
引用
收藏
页码:173 / 188
页数:16
相关论文
共 50 条
  • [1] A machine-learning approach to automatic detection of delimiters in tabular data files
    Saurav, Shitesh
    Schwarz, Peter
    [J]. PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 1501 - 1503
  • [2] A machine learning-based workflow for automatic detection of anomalies in machine tools
    Zuefle, Marwin
    Moog, Felix
    Lesch, Veronika
    Krupitzer, Christian
    Kounev, Samuel
    [J]. ISA TRANSACTIONS, 2022, 125 : 445 - 458
  • [3] Automatic Evasion of Machine Learning-Based Network Intrusion Detection Systems
    Yan, Haonan
    Li, Xiaoguang
    Zhang, Wenjing
    Wang, Rui
    Li, Hui
    Zhao, Xingwen
    Li, Fenghua
    Lin, Xiaodong
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (01) : 153 - 167
  • [4] High Precision Deep Learning-Based Tabular Position Detection
    Jiang, JiChu
    Simsek, Murat
    Kantarci, Burak
    Khan, Shahzad
    [J]. 2020 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2020, : 282 - 288
  • [5] Machine Learning-Based Intrusion Detection System For Healthcare Data
    Balyan, Amit Kumar
    Ahuja, Sachin
    Sharma, Sanjeev Kumar
    Lilhore, Umesh Kumar
    [J]. PROCEEDINGS OF 3RD IEEE CONFERENCE ON VLSI DEVICE, CIRCUIT AND SYSTEM (IEEE VLSI DCS 2022), 2022, : 290 - 294
  • [6] A Deep Learning-Based Pipeline for the Generation of Synthetic Tabular Data
    Panfilo, Daniele
    Boudewijn, Alexander
    Saccani, Sebastiano
    Coser, Andrea
    Svara, Borut
    Chauvenet, Carlo Rossi
    Mami, Ciro Antonio
    Medvet, Eric
    [J]. IEEE ACCESS, 2023, 11 : 63306 - 63323
  • [7] The Explainability-Privacy-Utility Trade-Off for Machine Learning-Based Tabular Data Analysis
    Abbasi, Wisam
    Mori, Paolo
    Saracino, Andrea
    [J]. PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, SECRYPT 2023, 2023, : 511 - 519
  • [8] Deep Learning-Based Data Forgery Detection in Automatic Generation Control
    Zhang, Fengli
    Li, Qinghua
    [J]. 2017 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2017, : 400 - 404
  • [9] TabCellNet: Deep learning-based tabular cell structure detection q
    Jiang, JiChu
    Simsek, Murat
    Kantarci, Burak
    Khan, Shahzad
    [J]. NEUROCOMPUTING, 2021, 440 : 12 - 23
  • [10] On the Role of Data Balancing for Machine Learning-Based Code Smell Detection
    Pecorelli, Fabiano
    Di Nucci, Dario
    De Roover, Coen
    De Lucia, Andrea
    [J]. PROCEEDINGS OF THE 3RD ACM SIGSOFT INTERNATIONAL WORKSHOP ON MACHINE LEARNING TECHNIQUES FOR SOFTWARE QUALITY EVALUATION (MALTESQUE '19), 2019, : 19 - 24