Scalable aggregation predictive analyticsA query-driven machine learning approach

被引:0
|
作者
Christos Anagnostopoulos
Fotis Savva
Peter Triantafillou
机构
[1] University of Glasgow,School of Computing Science
来源
Applied Intelligence | 2018年 / 48卷
关键词
Query-driven predictive analytics; Predictive modeling; Aggregation operators; Set cardinality prediction; Regression vector quantization; Self-organizing maps;
D O I
暂无
中图分类号
学科分类号
摘要
We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method.
引用
收藏
页码:2546 / 2567
页数:21
相关论文
共 50 条
  • [41] Machine Learning for Predictive Maintenance: A Multiple Classifier Approach
    Susto, Gian Antonio
    Schirru, Andrea
    Pampuri, Simone
    McLoone, Sean
    Beghi, Alessandro
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2015, 11 (03) : 812 - 820
  • [42] Machine Learning approach for Predictive Maintenance in Industry 4.0
    Paolanti, Marina
    Romeo, Luca
    Felicetti, Andrea
    Mancini, Adriano
    Frontoni, Emanuele
    Loncarski, Jelena
    2018 14TH IEEE/ASME INTERNATIONAL CONFERENCE ON MECHATRONIC AND EMBEDDED SYSTEMS AND APPLICATIONS (MESA), 2018,
  • [43] Importance of Extreme Learning Machine in the field of Query Classification: A Novel Approach
    Gugnani, Shashank
    Bihany, Tushar
    Roul, Rajendra Kumar
    2014 9TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2014, : 859 - 864
  • [44] Machine Learning Driven Design Of Experiments For Predictive Models In Production Systems
    Maier, Sebastian
    Zimmermann, Patrick
    Daub, Ruediger
    PROCEEDINGS OF THE CONFERENCE ON PRODUCTION SYSTEMS AND LOGISTICS, CPSL 2023-2, 2023, : 110 - 118
  • [45] Assessing the Impact of Temporal Data Aggregation on the Reliability of Predictive Machine Learning Models
    Barhrhouj, Ayah
    Ananou, Bouchra
    Ouladsine, Mustapha
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT I, 2025, 15346 : 481 - 492
  • [46] Rapid triage for ischemic stroke: a machine learning-driven approach in the context of predictive, preventive and personalised medicine
    Zheng, Yulu
    Guo, Zheng
    Zhang, Yanbo
    Shang, Jianjing
    Yu, Leilei
    Fu, Ping
    Liu, Yizhi
    Li, Xingang
    Wang, Hao
    Ren, Ling
    Zhang, Wei
    Hou, Haifeng
    Tan, Xuerui
    Wang, Wei
    EPMA JOURNAL, 2022, 13 (02): : 285 - 298
  • [47] Rapid triage for ischemic stroke: a machine learning-driven approach in the context of predictive, preventive and personalised medicine
    Yulu Zheng
    Zheng Guo
    Yanbo Zhang
    Jianjing Shang
    Leilei Yu
    Ping Fu
    Yizhi Liu
    Xingang Li
    Hao Wang
    Ling Ren
    Wei Zhang
    Haifeng Hou
    Xuerui Tan
    Wei Wang
    EPMA Journal, 2022, 13 : 285 - 298
  • [48] LEARNING TRANSFORMATION RULES FOR SEMANTIC QUERY OPTIMIZATION - A DATA-DRIVEN APPROACH
    SHEKHAR, S
    HAMIDZADEH, B
    KOHLI, A
    COYLE, M
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (06) : 950 - 964
  • [49] A Comprehensive Energy Modeling Approach for Query Processing: Steps and Machine Learning Influence
    Dembele, Simon Pierre
    De Simone, Marco Claudio
    Lorusso, Angelo
    Santaniello, Domenico
    PROCEEDINGS OF NINTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 9, ICICT 2024, 2025, 1054 : 131 - 143
  • [50] Predictive modeling for wine authenticity using a machine learning approach
    Costa, Nattane Luiza da
    Valentin, Leonardo A.
    Castro, Inar Alves
    Barbosa, Rommel Melgaco
    ARTIFICIAL INTELLIGENCE IN AGRICULTURE, 2021, 5 : 157 - 162