Benchmarking SQL on MapReduce systems using large astronomy databases

被引:0
|
作者
Amin Mesmoudi
Mohand-Saïd Hacid
Farouk Toumani
机构
[1] Université de Lyon,
[2] CNRS,undefined
[3] Université Lyon 1,undefined
[4] LIRIS,undefined
[5] UMR5205,undefined
[6] Université Blaise Pascal,undefined
[7] CNRS,undefined
[8] LIMOS - UMR CNRS 6158,undefined
来源
关键词
LSST; DBMS; Benchmark; Distributed systems ; MapReduce; SQL;
D O I
暂无
中图分类号
学科分类号
摘要
In the era of bigdata, with a massive set of digital information of unprecedented volumes being collected and/or produced in several application domains, it becomes more and more difficult to manage and query large data repositories. In the framework of the PetaSky project (http://com.isima.fr/Petasky), we focus on the problem of managing scientific data in the field of cosmology. The data we consider are those of the LSST project (http://www.lsst.org/). The overall size of the database that will be produced is expected to exceed 60 PB (Lsst data challenge handbook, 2012). In order to evaluate the performances of existing SQL On MapReduce data management systems, we conducted extensive experiments by using data and queries from the area of cosmology. The goal of this work is to report on the ability of such systems to support large scale declarative queries. We mainly investigated the impact of data partitioning, indexing and compression on query execution performances.
引用
收藏
页码:347 / 378
页数:31
相关论文
共 50 条
  • [31] Large Scale Fuzzy pD* Reasoning Using MapReduce
    Liu, Chang
    Qi, Guilin
    Wang, Haofen
    Yu, Yong
    SEMANTIC WEB - ISWC 2011, PT I, 2011, 7031 : 405 - +
  • [32] Large scale extreme learning machine using MapReduce
    Dong, Li
    Zhisong, Pan
    Zhantao, Deng
    Yanyan, Zhang
    International Journal of Digital Content Technology and its Applications, 2012, 6 (20) : 62 - 70
  • [33] Data De duplication Using N0SQL Databases in Cloud
    Backialakshmi, N.
    Manikandan, M.
    PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORKS SECURITY (ICSNS 2015), 2015,
  • [34] Benchmarking research performance of the IITs using Web of Science and Scopus bibliometric databases
    Prathap, Gangan
    CURRENT SCIENCE, 2013, 105 (08): : 1134 - 1138
  • [35] Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems
    Gao, Tao
    Guo, Yanfei
    Zhang, Boyu
    Cicotti, Pietro
    Lu, Yutong
    Balaji, Pavan
    Taufer, Michela
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 1098 - 1108
  • [36] Benchmarking of 16S rRNA gene databases using known strain
    Dixit, Kunal
    Davray, Dimple
    Chaudhari, Diptaraj
    Kadam, Pratik
    Kshirsagar, Rudresh
    Shouche, Yogesh
    Dhotre, Dhiraj
    Saroj, Sunil D.
    BIOINFORMATION, 2021, 17 (03) : 377 - 391
  • [37] Efficient Subgraph Matching on Large RDF Graphs Using MapReduce
    Xin Wang
    Lele Chai
    Qiang Xu
    Yajun Yang
    Jianxin Li
    Junhu Wang
    Yunpeng Chai
    Data Science and Engineering, 2019, 4 : 24 - 43
  • [38] Measuring Documents Similarity in Large Corpus using MapReduce Algorithm
    Birjali, Marouane
    Beni-Hssane, Abderrahim
    Erritali, Mohammed
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2016, : 24 - 28
  • [39] Extracting Functional Dependencies in Large Datasets Using MapReduce Model
    Amshakala, K.
    Nedunchezhian, R.
    Rajalakshmi, M.
    INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2014, 10 (03) : 19 - 35
  • [40] Efficient Subgraph Matching on Large RDF Graphs Using MapReduce
    Wang, Xin
    Chai, Lele
    Xu, Qiang
    Yang, Yajun
    Li, Jianxin
    Wang, Junhu
    Chai, Yunpeng
    DATA SCIENCE AND ENGINEERING, 2019, 4 (01) : 24 - 43