NoDB: Efficient Query Execution on Raw Data Files

被引:11
|
作者
Alagiannis, Ioannis [1 ]
Borovica-Gajic, Renata [1 ]
Branco, Miguel [1 ]
Idreos, Stratos [2 ]
Ailamaki, Anastasia [1 ]
机构
[1] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[2] Harvard Univ, Cambridge, MA 02138 USA
关键词
All Open Access; Green;
D O I
10.1145/2830508
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As data collections become larger and larger, users are faced with increasing bottlenecks in their data analysis. More data means more time to prepare and to load the data into the database before executing the desired queries. Many applications already avoid using database systems, for example, scientific data analysis and social networks, due to the complexity and the increased data-to-query time, that is, the time between getting the data and retrieving its first useful results. For many applications data collections keep growing fast, even on a daily basis, and this data deluge will only increase in the future, where it is expected to have much more data than what we can move or store, let alone analyze. We here present the design and roadmap of a new paradigm in database systems, called NoDB, which do not require data loading while still maintaining the whole feature set of a modern database system. In particular, we show how to make raw data files a first-class citizen, fully integrated with the query engine. Through our design and lessons learned by implementing the NoDB philosophy over a modern Database Management Systems (DBMS), we discuss the fundamental limitations as well as the strong opportunities that such a research path brings. We identify performance bottlenecks specific for in situ processing, namely the repeated parsing and tokenizing overhead and the expensive data type conversion. To address these problems, we introduce an adaptive indexing mechanism that maintains positional information to provide efficient access to raw data files, together with a flexible caching structure. We conclude that NoDB systems are feasible to design and implement over modern DBMS, bringing an unprecedented positive effect in usability and performance.
引用
收藏
页码:112 / 121
页数:10
相关论文
共 50 条
  • [1] NoDB in Action: Adaptive Query Processing on Raw Data
    Alagiannis, Loannis
    Borovica, Renata
    Branco, Miguel
    Idreost, Stratos
    Ailamaki, Anastasia
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 1942 - 1945
  • [2] OPTIMIZING SORT ORDER QUERY EXECUTION IN BALANCED AND NESTED GRID FILES
    MUECK, TA
    SCHAUER, MJ
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1995, 7 (02) : 246 - 260
  • [3] Efficient Execution of Data warehouse query using look ahead matching algorithm
    Prakash, Kale Sarika
    Prathap, P. M. Joe
    [J]. 2016 INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL AND DYNAMIC OPTIMIZATION TECHNIQUES (ICACDOT), 2016, : 384 - 388
  • [4] On query execution over encrypted data
    Baby, Tinu
    Cherukuri, Aswani Kumar
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2015, 8 (02) : 321 - 331
  • [5] Adaptive Query Processing on RAW Data
    Karpathiotakis, Manos
    Branco, Miguel
    Alagiannis, Ioannis
    Ailamaki, Anastasia
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (12): : 1119 - 1130
  • [6] A Query Service for Raw Sensor Data
    McCann, Donall
    Roantree, Mark
    [J]. SMART SENSING AND CONTEXT, PROCEEDINGS, 2009, 5741 : 38 - 50
  • [7] An adaptive query execution system for data integration
    Ives, ZG
    Florescu, D
    Friedman, M
    Levy, A
    Weld, DS
    [J]. SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 299 - 310
  • [8] Cloaking data to ease view creation, query expression, and query execution
    [J]. Murthy, S. (sudarshan.murthy@elseinstitute.org), 1600, Springer Verlag (7260 LNCS):
  • [9] Efficient query execution on broadcasted index tree structures
    Hambrusch, Susanne
    Liu, Chuan-Ming
    Aref, Walid G.
    Prabhakar, Sunil
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 60 (03) : 511 - 529
  • [10] OCTOPUS: Efficient Query Execution on Dynamic Mesh Datasets
    Tauheed, Farhan
    Heinis, Thomas
    Schuermann, Felix
    Markram, Henry
    Ailamaki, Anastasia
    [J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 1000 - 1011