An Architecture for Data Warehousing in Big Data Environments

被引:5
|
作者
Martinho, Bruno [1 ]
Santos, Maribel Yasmina [1 ]
机构
[1] Univ Minho, ALGORITMI Res Ctr, Guimaraes, Portugal
关键词
Big data; Data warehouse; NoSQL; Hadoop; Hive; Impala;
D O I
10.1007/978-3-319-49944-4_18
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in Information Technologies facilitate the increasing capacity to collect and store data, being the Big Data term often mentioned. In this context, many challenges need to be addressed, being Data Warehousing one of them. In this sense, the main purpose of this work is to propose an architecture for Data Warehousing in Big Data, taking as input a data source stored in a traditional Data Warehouse, which is transformed into a Data Warehouse in Hive. Before proposing and implementing the architecture, a benchmark was conducted to verify the processing times of Hive and Impala, understanding how these technologies could be integrated in an architecture where Hive plays the role of a Data Warehouse and Impala is the driving force for the analysis and visualization of data. After the proposal of the architecture, it was implemented using tools like the Hadoop ecosystem, Talend and Tableau, and validated using a data set with more than 100 million records, obtaining satisfactory results in terms of processing times.
引用
收藏
页码:237 / 250
页数:14
相关论文
共 50 条
  • [1] A Two-Level Architecture for Data Warehousing and OLAP Over Big Data
    Dhaouadi, Asma
    Gammoudi, Mohamed Mohsen
    Hammoudi, Slimane
    [J]. VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 7182 - 7194
  • [2] Data lineage tracing in data warehousing environments
    Fan, Hao
    [J]. DATA MANAGEMENT: DATA, DATA EVERYWHERE, PROCEEDINGS, 2007, 4587 : 25 - 36
  • [3] Advances in data warehousing and OLAP in the big Data Era
    Bellatreche, Ladjel
    Cuzzocrea, Alfredo
    Song, Il-Yeol
    [J]. INFORMATION SYSTEMS, 2015, 53 : 39 - 40
  • [4] MapReduce Research on Warehousing of Big Data
    Pticek, M.
    Vrdoljak, B.
    [J]. 2017 40TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2017, : 1361 - 1366
  • [5] Survey of Big Data Warehousing Techniques
    Kaur, Jaspreet
    Shedge, Rajashree
    Joshi, Bharti
    [J]. INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES, ICICCT 2019, 2020, 89 : 471 - 481
  • [6] Materialized views in data warehousing environments
    Ciferri, CDD
    de Souza, FD
    [J]. SCCC 2001: XXI INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, PROCEEDINGS, 2001, : 3 - 12
  • [7] PAUSE: A Privacy Architecture for Heterogeneous Big Data Environments
    Jutla, Dawn N.
    Bodorik, Peter
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1919 - 1928
  • [8] An architecture for data warehousing supporting data independence and interoperability
    Cabibbo, L
    Torlone, R
    [J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2001, 10 (03) : 377 - 397
  • [9] A cluster architecture for parallel data warehousing
    Dehne, F
    Eavis, T
    Rau-Chaplin, A
    [J]. FIRST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, 2001, : 161 - 168
  • [10] Semantic Web Technologies and Big Data Warehousing
    Pticek, M.
    Vrdoljak, B.
    [J]. 2018 41ST INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2018, : 1214 - 1219