Large-scale predictive modeling and analytics through regression queries in data management systems

被引:0
|
作者
Christos Anagnostopoulos
Peter Triantafillou
机构
[1] University of Glasgow,School of Computing Science
[2] University of Warwick,Department of Computer Science
关键词
Predictive analytics; Piecewise linear regression learning; Query-driven analytics; Data subspace exploration; Vector regression quantization;
D O I
暂无
中图分类号
学科分类号
摘要
Regression analytics has been the standard approach to modeling the relationship between input and output variables, while recent trends aim to incorporate advanced regression analytics capabilities within data management systems (DMS). Linear regression queries are fundamental to exploratory analytics and predictive modeling. However, computing their exact answers leaves a lot to be desired in terms of efficiency and scalability. We contribute with a novel predictive analytics model and an associated statistical learning methodology, which are efficient, scalable and accurate in discovering piecewise linear dependencies among variables by observing only regression queries and their answers issued to a DMS. We focus on in-DMS piecewise linear regression and specifically in predicting the answers to mean-value aggregate queries, identifying and delivering the piecewise linear dependencies between variables to regression queries and predicting the data dependent variables within specific data subspaces defined by analysts and data scientists. Our goal is to discover a piecewise linear data function approximation over the underlying data only through query–answer pairs that is competitive with the best piecewise linear approximation to the ground truth. Our methodology is analyzed, evaluated and compared with exact solution and near-perfect approximations of the underlying relationships among variables achieving orders of magnitude improvement in analytics processing.
引用
收藏
页码:17 / 55
页数:38
相关论文
共 50 条
  • [1] Large-scale predictive modeling and analytics through regression queries in data management systems
    Anagnostopoulos, Christos
    Triantafillou, Peter
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2020, 9 (01) : 17 - 55
  • [2] Predictive modeling of everyday behavior from large-scale data
    Motomura, Yoichi
    [J]. Synthesiology, 2009, 2 (01): : 1 - 12
  • [3] Distributed optimization over large-scale systems for big data analytics
    Shahbazian, Reza
    [J]. 4OR-A QUARTERLY JOURNAL OF OPERATIONS RESEARCH, 2021, 19 (02): : 309 - 310
  • [4] Distributed optimization over large-scale systems for big data analytics
    Reza Shahbazian
    [J]. 4OR, 2021, 19 : 309 - 310
  • [5] A Hybrid Data Model for Large-Scale Analytics
    Feo, John
    [J]. 2018 ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2018, : 269 - 269
  • [6] Conceptual Data Modeling Using Aggregates to Ensure Large-Scale Distributed Data Management Systems Security
    Poltavtseva, Maria A.
    Kalinin, Maxim O.
    [J]. INTELLIGENT DISTRIBUTED COMPUTING XIII, 2020, 868 : 41 - 47
  • [7] Survey of Large-Scale Data Management Systems for Big Data Applications
    Wu, Lengdong
    Yuan, Liyan
    You, Jiahuai
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (01) : 163 - 183
  • [8] Survey of Large-Scale Data Management Systems for Big Data Applications
    Lengdong Wu
    Liyan Yuan
    Jiahuai You
    [J]. Journal of Computer Science and Technology, 2015, 30 : 163 - 183
  • [9] Visual Analytics of Large-Scale Climate Model Data
    Wong, Pak Chung
    Shen, Han-Wei
    Leung, Ruby
    Hagos, Samson
    Lee, Teng-Yok
    Tong, Xin
    Lu, Kewei
    [J]. 2014 IEEE 4TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV), 2014, : 85 - 92
  • [10] Disco: A Computing Platform for Large-Scale Data Analytics
    Mundkur, Prashanth
    Tuulos, Ville
    Flatow, Jared
    [J]. ERLANG 11: PROCEEDINGS OF THE 2011 ACM SIGPLAN ERLANG WORKSHOP, 2011, : 84 - 89