Scalable aggregation predictive analyticsA query-driven machine learning approach

被引:0
|
作者
Christos Anagnostopoulos
Fotis Savva
Peter Triantafillou
机构
[1] University of Glasgow,School of Computing Science
来源
Applied Intelligence | 2018年 / 48卷
关键词
Query-driven predictive analytics; Predictive modeling; Aggregation operators; Set cardinality prediction; Regression vector quantization; Self-organizing maps;
D O I
暂无
中图分类号
学科分类号
摘要
We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method.
引用
收藏
页码:2546 / 2567
页数:21
相关论文
共 50 条
  • [21] An evolution-based DNA-binding residue predictor using a dynamic query-driven learning scheme
    Chai, H.
    Zhang, J.
    Yang, G.
    Ma, Z.
    MOLECULAR BIOSYSTEMS, 2016, 12 (12) : 3643 - 3650
  • [22] Social Annotation in Query Expansion: a Machine Learning Approach
    Lin, Yuan
    Lin, Hongfei
    Jin, Song
    Ye, Zheng
    PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 405 - 414
  • [23] A Machine Learning Approach to SPARQL Query Performance Prediction
    Hasan, Rakebul
    Gandon, Fabien
    2014 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2014, : 266 - 273
  • [24] A Data-Driven Predictive Approach for Drug Delivery Using Machine Learning Techniques
    Li, YuanYuan
    Lenaghan, Scott C.
    Zhang, Mingjun
    PLOS ONE, 2012, 7 (02):
  • [25] Learning From Query-Answers: A Scalable Approach to Belief Updating and Parameter Learning
    Meneghetti, Niccolo
    Kennedy, Oliver
    Gatterbauer, Wolfgang
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2018, 43 (04):
  • [26] A machine learning approach for predictive warehouse design
    Tufano, Alessandro
    Accorsi, Riccardo
    Manzini, Riccardo
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2022, 119 (3-4): : 2369 - 2392
  • [27] A machine learning approach for predictive warehouse design
    Alessandro Tufano
    Riccardo Accorsi
    Riccardo Manzini
    The International Journal of Advanced Manufacturing Technology, 2022, 119 : 2369 - 2392
  • [28] Predictive analytics of HR - A machine learning approach
    Kakulapati, V.
    Chaitanya, Kalluri Krishna
    Chaitanya, Kolli Vamsi Guru
    Akshay, Ponugoti
    JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS, 2020, 23 (06): : 959 - 969
  • [29] A Machine Learning Approach for Displaying Query Results in Search Engines
    Gungor, Tunga
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PT I, 2013, 8047 : 180 - 187
  • [30] A machine learning approach to query generation in plagiarism source retrieval
    Lei-lei KONG
    Zhi-mao LU
    Hao-liang QI
    Zhong-yuan HAN
    Frontiers of Information Technology & Electronic Engineering, 2017, 18 (10) : 1556 - 1572