Large-scale predictive modeling and analytics through regression queries in data management systems

被引:0
|
作者
Christos Anagnostopoulos
Peter Triantafillou
机构
[1] University of Glasgow,School of Computing Science
[2] University of Warwick,Department of Computer Science
关键词
Predictive analytics; Piecewise linear regression learning; Query-driven analytics; Data subspace exploration; Vector regression quantization;
D O I
暂无
中图分类号
学科分类号
摘要
Regression analytics has been the standard approach to modeling the relationship between input and output variables, while recent trends aim to incorporate advanced regression analytics capabilities within data management systems (DMS). Linear regression queries are fundamental to exploratory analytics and predictive modeling. However, computing their exact answers leaves a lot to be desired in terms of efficiency and scalability. We contribute with a novel predictive analytics model and an associated statistical learning methodology, which are efficient, scalable and accurate in discovering piecewise linear dependencies among variables by observing only regression queries and their answers issued to a DMS. We focus on in-DMS piecewise linear regression and specifically in predicting the answers to mean-value aggregate queries, identifying and delivering the piecewise linear dependencies between variables to regression queries and predicting the data dependent variables within specific data subspaces defined by analysts and data scientists. Our goal is to discover a piecewise linear data function approximation over the underlying data only through query–answer pairs that is competitive with the best piecewise linear approximation to the ground truth. Our methodology is analyzed, evaluated and compared with exact solution and near-perfect approximations of the underlying relationships among variables achieving orders of magnitude improvement in analytics processing.
引用
下载
收藏
页码:17 / 55
页数:38
相关论文
共 50 条
  • [21] Queries over Large-scale Log Data of Hybrid Granularities
    Zhao, Gansen
    Zhuang, Xutian
    Wang, Xinming
    Nie, Ruihua
    Liao, Zhirui
    Lin, Chengchuang
    Li, Zhenyu
    2016 15TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2016, : 240 - 246
  • [22] Predictive control design for large-scale systems
    Katebi, MR
    Johnson, MA
    AUTOMATICA, 1997, 33 (03) : 421 - 425
  • [23] Queries over Large-scale Incremental Data of Hybrid Granularities
    Zhuang, Xutian
    Zhao, Gansen
    Wang, Xinming
    Nie, Ruihua
    Liao, Zhirui
    Lin, Chengchuang
    Li, Zhenyu
    2016 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2016, : 69 - 74
  • [24] SIMULATION MODELING OF LARGE-SCALE SYSTEMS
    FOSTER, JW
    HOGG, GL
    GONZALEZVEGA, O
    PROCEEDINGS ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 1986, (SYM): : 419 - 426
  • [25] Large-Scale Modeling of Economic Systems
    Holcombe, Mike
    Coakley, Simon
    Kiran, Mariam
    Chin, Shawn
    Greenough, Chris
    Worth, David
    Cincotti, Silvano
    Raberto, M.
    Teglio, Andrea
    Deissenberg, Christophe
    van der Hoog, Sander
    Dawid, Herbert
    Gemkow, Simon
    Harting, Philipp
    Neugart, Michael
    COMPLEX SYSTEMS, 2013, 22 (02): : 175 - 191
  • [26] Coordinating multiple model predictive controllers for the management of large-scale water systems
    Anand, Abhay
    Galelli, Stefano
    Samavedham, Lakshminarayanan
    Sundaramoorthy, Sitanandam
    JOURNAL OF HYDROINFORMATICS, 2013, 15 (02) : 293 - 305
  • [27] Common Data Elements, Scalable Data Management Infrastructure, and Analytics Workflows for Large-Scale Neuroimaging Studies
    Kuplicki, Rayus
    Touthang, James
    Al Zoubi, Obada
    Mayeli, Ahmad
    Misaki, Masaya
    Aupperle, Robin L.
    Teague, T. Kent
    McKinney, Brett A.
    Paulus, Martin P.
    Bodurka, Jerzy
    FRONTIERS IN PSYCHIATRY, 2021, 12
  • [28] Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
    Veiga, Jorge
    Exposito, Roberto R.
    Pardo, Xoan C.
    Taboada, Guillermo L.
    Tourino, Juan
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 424 - 431
  • [29] Data-based distributed model predictive control for large-scale systems
    Li, Yan
    Zhang, Hao
    Wang, Zhuping
    Huang, Chao
    Yan, Huaicheng
    NONLINEAR DYNAMICS, 2024, : 3965 - 3980
  • [30] Predictive modeling and anomaly detection in large-scale web portals through the CAWAL framework
    Canay, Özkan
    Kocabıçak, Ümit
    Knowledge-Based Systems, 2024, 306