AIDA - Abstraction for Advanced In-Database Analytics

被引:16
|
作者
D'silva, Joseph Vinish [1 ]
De Moor, Florestan [1 ]
Kemme, Bettina [1 ]
机构
[1] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 11期
关键词
D O I
10.14778/3236187.3236194
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the tremendous growth in data science and machine learning, it has become increasingly clear that traditional relational database management systems (RDBMS) are lacking appropriate support for the programming paradigms required by such applications, whose developers prefer tools that perform the computation outside the database system. While the database community has attempted to integrate some of these tools in the RDBMS, this has not swayed the trend as existing solutions are often not convenient for the incremental, iterative development approach used in these fields. In this paper, we propose AIDA - an abstraction for advanced in-database analytics. AIDA emulates the syntax and semantics of popular data science packages but transparently executes the required transformations and computations inside the RDBMS. In particular, AIDA works with a regular Python interpreter as a client to connect to the database. Furthermore, it supports the seamless use of both relational and linear algebra operations using a unified abstraction. AIDA relies on the RDBMS engine to efficiently execute relational operations and on an embedded Python interpreter and NumPy to perform linear algebra operations. Data reformatting is done transparently and avoids data copy whenever possible. AIDA does not require changes to statistical packages or the RDBMS facilitating portability.
引用
收藏
页码:1400 / 1413
页数:14
相关论文
共 50 条
  • [1] In-Database Analytics with ibmdbpy
    Fouche, Edouard
    Eckert, Alexander
    Boehm, Klemens
    [J]. 30TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2018), 2018,
  • [2] Special issue on in-database analytics
    Olteanu, Dan
    Rusu, Florin
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (3-4) : 333 - 334
  • [3] Special issue on in-database analytics
    Dan Olteanu
    Florin Rusu
    [J]. Distributed and Parallel Databases, 2017, 35 : 333 - 334
  • [4] In-Database Graph Analytics with Recursive SPARQL
    Hogan, Aidan
    Reutter, Juan L.
    Soto, Adrian
    [J]. SEMANTIC WEB - ISWC 2020, PT I, 2020, 12506 : 511 - 528
  • [5] In-Database Geospatial Analytics using Python']Python
    Roy, Avipsa
    Fouche, Edouard
    Morales, Rafael Rodriguez
    Moehler, Gregor
    [J]. ARIC 2019: PROCEEDINGS OF THE 2ND ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON ADVANCES IN RESILIENT AND INTELLIGENT CITIES (ARIC-2019), 2019, : 17 - 24
  • [6] Making an RDBMS Data Scientist Friendly Advanced In-database Interactive Analytics with Visualization Support
    D'silva, Joseph Vinish
    De Moor, Florestan
    Kemme, Bettina
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 1930 - 1933
  • [7] Trinity: In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data Analytics
    Kim, Ji-Hoon
    Han, Seunghee
    Park, Kwanghyun
    Ji, Soo-Young
    Kim, Joo-Young
    [J]. IEEE ACCESS, 2024, 12 : 11945 - 11962
  • [8] The Design and Implementation of AIDA: Ancient Inscription Database and Analytics System
    Revesz, Peter Z.
    Rashid, M. Parvez
    Tuyishime, Yves
    [J]. IDEAS '19: PROCEEDINGS OF THE 23RD INTERNATIONAL DATABASE APPLICATIONS & ENGINEERING SYMPOSIUM (IDEAS 2019), 2019, : 292 - 297
  • [9] IN-DATABASE RASTER ANALYTICS: MAP ALGEBRA AND PARALLEL PROCESSING IN ORACLE SPATIAL GEORASTER
    Xie, Qingyun
    Zhang, Zhihai
    Ravada, Siva
    [J]. XXII ISPRS CONGRESS, TECHNICAL COMMISSION IV, 2012, 39-B4 : 91 - 96
  • [10] Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics
    Qian, Hai
    Yang, Shengwen
    Iyer, Rahul
    Feng, Xixuan
    Wellons, Mark
    Welton, Caleb
    [J]. TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, 2014, 8643 : 417 - 428