Learning to Accurately COUNT with Query-Driven Predictive Analytics

被引:0
|
作者
Anagnostopoulos, Christos [1 ]
Triantafillou, Peter [1 ]
机构
[1] Univ Glasgow, Sch Comp Sci, Glasgow G12 8QQ, Lanark, Scotland
关键词
query-driven analytics; range queries; aggregation operators; self-organized map regression;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study a novel solution to executing aggregation (and specifically COUNT) queries over large-scale data. The proposed solution is generally applicable, in the sense that it can be deployed in environments in which data owners may or may not restrict access to their data and allow only 'aggregation operators' to be executed over their data. For this, it is based on predictive analytics, driven by queries and their results. We propose a machine learning (ML) framework for the task (which can be adapted for different aggregates as well). We focus on the widely used set-cardinality (i.e., COUNT) aggregation operator, as it is a fundamental operator for both internal data system optimisations and for aggregation-query analytics. We contribute a novel, query-driven ML model whose goals are to: (i) learn the query space (access patterns), (ii) associate (complex) aggregation queries with the cardinality of their results, (iii) define query similarity and use it to predict the cardinality of the answer set of an adhoc incoming query. Our ML model incorporates incremental learning algorithms for ensuring high prediction accuracy even when both the querying patterns and the underlying data change. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general environments which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for big data analytics, and (iii) offers a performance (in terms of prediction accuracy and time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model, evaluating its sensitivity and comparative advantages versus acclaimed data-centric methods (self-tuning histograms, sampling, and multidimensional histograms).
引用
收藏
页码:14 / 23
页数:10
相关论文
共 50 条
  • [1] Query-Driven Learning for Next Generation Predictive Modeling & Analytics
    Savva, Fotis
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1844 - 1846
  • [2] Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality
    Anagnostopoulos, Christos
    Triantafillou, Peter
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2017, 11 (04)
  • [3] Scalable aggregation predictive analyticsA query-driven machine learning approach
    Christos Anagnostopoulos
    Fotis Savva
    Peter Triantafillou
    Applied Intelligence, 2018, 48 : 2546 - 2567
  • [4] Query-Driven Descriptive Analytics for IoT and Edge Computing
    Symeonides, Moysis
    Trihinas, Demetris
    Georgiou, Zacharias
    Pallis, George
    Dikaiakos, Marios D.
    2019 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2019, : 1 - 11
  • [5] Accelerating network traffic analytics using query-driven visualization
    Bethel, E. Wes
    Campbell, Scott
    Dart, Eli
    Stockinger, Kurt
    Wu, Kesheng
    VAST 2006: IEEE SYMPOSIUM ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY, PROCEEDINGS, 2006, : 115 - +
  • [6] Query-Driven Multi-Instance Learning
    Hsu, Yen-Chi
    Hong, Cheng-Yao
    Lee, Ming-Sui
    Liu, Tyng-Luh
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4158 - 4165
  • [7] StreamSight: A Query-Driven Framework for Streaming Analytics in Edge Computing
    Georgiou, Zacharias
    Symeonides, Moysis
    Trihinas, Demetris
    Pallis, George
    Dikaiakos, Marios D.
    2018 IEEE/ACM 11TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2018, : 143 - 152
  • [8] Query-driven support pattern discovery for classification learning
    Han, YQ
    Lam, W
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 399 - 402
  • [9] Query-Driven Program Testing
    Holzer, Andreas
    Schallhart, Christian
    Tautschnig, Michael
    Veith, Helmut
    VERIFICATION, MODEL CHECKING, AND ABSTRACT INTERPRETATION, 2009, 5403 : 151 - 166
  • [10] Query-driven Constraint Acquisition
    Bessiere, Christian
    Coletta, Remi
    O'Sullivan, Barry
    Paulin, Mathias
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 50 - 55