Analysis of Big Data Storage Tools for Data Lakes based on Apache Hadoop Platform

被引:0
|
作者
Belov, Vladimir [1 ]
Nikulchev, Evgeny [1 ]
机构
[1] MIREA Russian Technol Univ, Moscow, Russia
关键词
Big data formats; data lakes; Apache Hadoop; data warehouses;
D O I
10.14569/IJACSA.2021.0120864
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When developing large data processing systems, the question of data storage arises. One of the modern tools for solving this problem is the so-called data lakes. Many implementations of data lakes use Apache Hadoop as a basic platform. Hadoop does not have a default data storage format, which leads to the task of choosing a data format when designing a data processing system. To solve this problem, it is necessary to proceed from the results of the assessment according to several criteria. In turn, experimental evaluation does not always give a complete understanding of the possibilities for working with a particular data storage format. In this case, it is necessary to study the features of the format, its internal structure, recommendations for use, etc. The article describes the features of both widely used data storage formats and the currently gaining popularity.
引用
收藏
页码:551 / 557
页数:7
相关论文
共 50 条
  • [1] Big Data Analysis using Apache Hadoop
    Manikandan, Shankar Ganesh
    Ravi, Siddarth
    [J]. 2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,
  • [2] Block Storage Optimization and Parallel Data Processing and Analysis of Product Big Data Based on the Hadoop Platform
    Wang, Yajun
    Cheng, Shengming
    Zhang, Xinchen
    Leng, Junyu
    Liu, Jun
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [3] Research on Industry Data Analysis Model Based on Hadoop Big Data Platform
    Xu, Hongsheng
    Fan, Ganglong
    Li, Ke
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, INFORMATION AND COMPUTER SCIENCE (ICEMC 2017), 2017, 73 : 783 - 787
  • [4] Analysis of Big Data Platform with OpenStack and Hadoop
    Li, Xiaoyan
    Lu, Zhihui
    Wang, Nini
    Wu, Jie
    Huang, Shalin
    [J]. ADVANCES IN SERVICES COMPUTING, 2016, 10065 : 375 - 390
  • [5] Developing a big data analytics platform using Apache Hadoop Ecosystem for delivering big data services in libraries
    Singh, Ranjeet Kumar
    [J]. DIGITAL LIBRARY PERSPECTIVES, 2024, 40 (02) : 160 - 186
  • [6] A Big Data Analysis Platform for Healthcare on Apache Spark
    Zhang, Jinwei
    Zhang, Yong
    Hu, Qingcheng
    Tian, Hongliang
    Xing, Chunxiao
    [J]. SMART HEALTH, ICSH 2016, 2017, 10219 : 32 - 43
  • [7] Power Big Data platform Based on Hadoop Technology
    Chen, Jilin
    Liu, Nana
    Chen, Yong
    Qiu, Weijiang
    [J]. PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS, ENVIRONMENT, BIOTECHNOLOGY AND COMPUTER (MMEBC), 2016, 88 : 571 - 576
  • [8] Shared Disk Big Data Analytics with Apache Hadoop
    Mukherjee, Anirban
    Datta, Joydip
    Jorapur, Raghavendra
    Singhvi, Ravi
    Haloi, Saurav
    Akram, Wasim
    [J]. 2012 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2012,
  • [9] Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive
    Garg, Varun
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 938 - 941
  • [10] Attack Models for Big Data Platform Hadoop
    Li, Ningwei
    Gao, Hang
    Liu, Liang
    Zhang, Fan
    Wang, Wenxuan
    [J]. 2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 154 - 159