Benchmarking SQL on MapReduce systems using large astronomy databases

被引:0
|
作者
Amin Mesmoudi
Mohand-Saïd Hacid
Farouk Toumani
机构
[1] Université de Lyon,
[2] CNRS,undefined
[3] Université Lyon 1,undefined
[4] LIRIS,undefined
[5] UMR5205,undefined
[6] Université Blaise Pascal,undefined
[7] CNRS,undefined
[8] LIMOS - UMR CNRS 6158,undefined
来源
关键词
LSST; DBMS; Benchmark; Distributed systems ; MapReduce; SQL;
D O I
暂无
中图分类号
学科分类号
摘要
In the era of bigdata, with a massive set of digital information of unprecedented volumes being collected and/or produced in several application domains, it becomes more and more difficult to manage and query large data repositories. In the framework of the PetaSky project (http://com.isima.fr/Petasky), we focus on the problem of managing scientific data in the field of cosmology. The data we consider are those of the LSST project (http://www.lsst.org/). The overall size of the database that will be produced is expected to exceed 60 PB (Lsst data challenge handbook, 2012). In order to evaluate the performances of existing SQL On MapReduce data management systems, we conducted extensive experiments by using data and queries from the area of cosmology. The goal of this work is to report on the ability of such systems to support large scale declarative queries. We mainly investigated the impact of data partitioning, indexing and compression on query execution performances.
引用
收藏
页码:347 / 378
页数:31
相关论文
共 50 条
  • [1] Benchmarking SQL on MapReduce systems using large astronomy databases
    Mesmoudi, Amin
    Hacid, Mohand-Said
    Toumani, Farouk
    DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (03) : 347 - 378
  • [2] Large databases in astronomy
    Szalay, AS
    Gray, J
    Kunszt, P
    Thakar, A
    Slutz, D
    MINING THE SKY, 2001, : 99 - 116
  • [3] Benchmarking Dependability of MapReduce Systems
    Sangroya, Amit
    Serrano, Damian
    Bouchenak, Sara
    2012 31ST INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2012), 2012, : 21 - 30
  • [4] Experience with benchmarking dependability and performance of MapReduce systems
    Sangroya, Amit
    Bouchenak, Sara
    Serrano, Damian
    PERFORMANCE EVALUATION, 2016, 101 : 1 - 19
  • [5] JackHare: a framework for SQL to NoSQL translation using MapReduce
    Chung, Wu-Chun
    Lin, Hung-Pin
    Chen, Shih-Chang
    Jiang, Mon-Fong
    Chung, Yeh-Ching
    AUTOMATED SOFTWARE ENGINEERING, 2014, 21 (04) : 489 - 508
  • [6] JackHare: a framework for SQL to NoSQL translation using MapReduce
    Wu-Chun Chung
    Hung-Pin Lin
    Shih-Chang Chen
    Mon-Fong Jiang
    Yeh-Ching Chung
    Automated Software Engineering, 2014, 21 : 489 - 508
  • [7] An In-Depth Benchmarking of Text-to-SQL Systems
    Gkini, Orest
    Belmpas, Theofilos
    Koutrika, Georgia
    Ioannidis, Yannis
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 632 - 644
  • [8] Benchmarking SQL-on-Hadoop Systems: TPC or Not TPC?
    Floratou, Avrilia
    Oezcan, Fatma
    Schiefer, Berni
    BIG DATA BENCHMARKING, WBDB 2014, 2015, 8991 : 63 - 72
  • [9] New problems and approaches related to large databases in astronomy
    Murtagh, F
    Aussem, A
    STATISTICAL CHALLENGES IN MODERN ASTRONOMY II, 1997, : 123 - 133
  • [10] USING SQL WITH OBJECT-ORIENTED DATABASES
    VANDENBUSSCHE, J
    HEUER, A
    INFORMATION SYSTEMS, 1993, 18 (07) : 461 - 487