Scalable aggregation predictive analyticsA query-driven machine learning approach

被引:0
|
作者
Christos Anagnostopoulos
Fotis Savva
Peter Triantafillou
机构
[1] University of Glasgow,School of Computing Science
来源
Applied Intelligence | 2018年 / 48卷
关键词
Query-driven predictive analytics; Predictive modeling; Aggregation operators; Set cardinality prediction; Regression vector quantization; Self-organizing maps;
D O I
暂无
中图分类号
学科分类号
摘要
We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method.
引用
收藏
页码:2546 / 2567
页数:21
相关论文
共 50 条
  • [1] Learning to Accurately COUNT with Query-Driven Predictive Analytics
    Anagnostopoulos, Christos
    Triantafillou, Peter
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 14 - 23
  • [2] Query-Driven Learning for Next Generation Predictive Modeling & Analytics
    Savva, Fotis
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1844 - 1846
  • [3] Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality
    Anagnostopoulos, Christos
    Triantafillou, Peter
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2017, 11 (04)
  • [4] Query-Driven Approach to Entity Resolution
    Altwaijry, Hotham
    Kalashnikov, Dmitri V.
    Mehrotra, Sharad
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (14): : 1846 - 1857
  • [5] Query-Driven Multi-Instance Learning
    Hsu, Yen-Chi
    Hong, Cheng-Yao
    Lee, Ming-Sui
    Liu, Tyng-Luh
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4158 - 4165
  • [6] Query-Driven Approach to Face Clustering and Tagging
    Zhang, Liyan
    Wang, Xikui
    Kalashnikov, Dmitri V.
    Mehrotra, Sharad
    Ramanan, Deva
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (10) : 4504 - 4513
  • [7] QDA: A Query-Driven Approach to Entity Resolution
    Altwaijry, Hotham
    Kalashnikov, Dmitri V.
    Mehrotra, Sharad
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (02) : 402 - 417
  • [8] Query-driven approach of contextual ontology module learning using web snippets
    Ben Mustapha, Nesrine
    Aufaure, Marie-Aude
    Zghal, Hajer Baazaoui
    Ben Ghezala, Henda
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2015, 45 (01) : 61 - 94
  • [9] Query-driven approach of contextual ontology module learning using web snippets
    Nesrine Ben Mustapha
    Marie-Aude Aufaure
    Hajer Baazaoui Zghal
    Henda Ben Ghezala
    Journal of Intelligent Information Systems, 2015, 45 : 61 - 94
  • [10] Query-driven indexing for scalable peer-to-peer text retrieval
    Skobeltsyn, Gleb
    Luu, Toan
    Zarko, Ivana Podnar
    Rajman, Martin
    Aberer, Karl
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2009, 25 (01): : 89 - 99