Analysis of Big Data Storage Tools for Data Lakes based on Apache Hadoop Platform

被引:0
|
作者
Belov, Vladimir [1 ]
Nikulchev, Evgeny [1 ]
机构
[1] MIREA Russian Technol Univ, Moscow, Russia
关键词
Big data formats; data lakes; Apache Hadoop; data warehouses;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When developing large data processing systems, the question of data storage arises. One of the modern tools for solving this problem is the so-called data lakes. Many implementations of data lakes use Apache Hadoop as a basic platform. Hadoop does not have a default data storage format, which leads to the task of choosing a data format when designing a data processing system. To solve this problem, it is necessary to proceed from the results of the assessment according to several criteria. In turn, experimental evaluation does not always give a complete understanding of the possibilities for working with a particular data storage format. In this case, it is necessary to study the features of the format, its internal structure, recommendations for use, etc. The article describes the features of both widely used data storage formats and the currently gaining popularity.
引用
收藏
页码:551 / 557
页数:7
相关论文
共 50 条
  • [21] The Hadoop Technology Applies in Power Big Data Platform
    Hu, Jianyong
    Chen, Jilin
    Xie, Mei
    Gao, Bo
    Yu, Zhihong
    Yan, Jianfeng
    Lv, Ying
    [J]. PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL AND ELECTRICAL ENGINEERING (AMEE 2017), 2017, 87 : 113 - 116
  • [22] Performance Challenges and Solutions in Big Data Platform Hadoop
    Singh, Balraj
    Verma, Harsh K.
    Madaan, Vishu
    [J]. Recent Advances in Computer Science and Communications, 2023, 16 (09):
  • [23] Design and Implementation of Sensory Data Collection and Storage Based on Hadoop Platform
    Bai, Zhen
    Cui, Shaohua
    Zhao, Chenglin
    [J]. COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 870 - 874
  • [24] Social Media Data Processing Infrastructure by Using Apache Spark Big Data Platform: Twitter Data Analysis
    Podhoranyi, Michal
    Vojacek, Lukas
    [J]. 2019 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTERNET OF THINGS (CCIOT 2019), 2019, : 1 - 6
  • [25] Evaluation of Apache Hadoop for parallel data analysis with ROOT
    Lehrack, S.
    Duckeck, G.
    Ebke, J.
    [J]. 20TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2013), PARTS 1-6, 2014, 513
  • [26] Research on adaptive recommendation algorithm for big data mining based on Hadoop platform
    Zhang, Jinming
    [J]. INTERNATIONAL JOURNAL OF INTERNET PROTOCOL TECHNOLOGY, 2019, 12 (04) : 213 - 220
  • [27] EMM: Extended matching market based scheduling for big data platform hadoop
    Balraj Singh
    Harsh K Verma
    [J]. Multimedia Tools and Applications, 2022, 81 : 34823 - 34847
  • [28] EMM: Extended matching market based scheduling for big data platform hadoop
    Singh, Balraj
    Verma, Harsh K.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34823 - 34847
  • [29] Performance optimization of computing task scheduling based on the Hadoop big data platform
    Li, Yang
    Hei, Xinhong
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022,
  • [30] Distributed Case-based Reasoning System Based on Big Data Platform Hadoop
    Wang, Chong-Yang
    Wang, Hong-Bing
    Liang, Yan-Rui
    [J]. 2015 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND INFORMATION SYSTEM (SEIS 2015), 2015, : 629 - 634