Analysis of Big Data Storage Tools for Data Lakes based on Apache Hadoop Platform

被引:0
|
作者
Belov, Vladimir [1 ]
Nikulchev, Evgeny [1 ]
机构
[1] MIREA Russian Technol Univ, Moscow, Russia
关键词
Big data formats; data lakes; Apache Hadoop; data warehouses;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When developing large data processing systems, the question of data storage arises. One of the modern tools for solving this problem is the so-called data lakes. Many implementations of data lakes use Apache Hadoop as a basic platform. Hadoop does not have a default data storage format, which leads to the task of choosing a data format when designing a data processing system. To solve this problem, it is necessary to proceed from the results of the assessment according to several criteria. In turn, experimental evaluation does not always give a complete understanding of the possibilities for working with a particular data storage format. In this case, it is necessary to study the features of the format, its internal structure, recommendations for use, etc. The article describes the features of both widely used data storage formats and the currently gaining popularity.
引用
收藏
页码:551 / 557
页数:7
相关论文
共 50 条
  • [31] A Modern Data Architecture with Apache Hadoop
    Singh, Tripty
    Darshan, V. S.
    [J]. 2015 INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT), 2015, : 574 - 579
  • [32] Processing LIDAR Data with Apache Hadoop
    Ruzicka, Jan
    Orcik, Lukas
    Ruzickova, Katerina
    Kisztner, Juraj
    [J]. RISE OF BIG SPATIAL DATA, 2017, : 351 - 358
  • [33] Forensic Investigation through Data Remnants on Hadoop Big Data Storage System
    Oo, Myat Nandar
    Parvin, Sazia
    Thein, Thandar
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2018, 33 (03): : 203 - 217
  • [34] Apache Spark a Big Data Analytics Platform for Smart Grid
    Shyam, R.
    Ganesh, Bharathi H. B.
    Kumar, Sachin S.
    Poornachandran, Prabaharan
    Soman, K. P.
    [J]. SMART GRID TECHNOLOGIES (ICSGT- 2015), 2015, 21 : 171 - 178
  • [35] Distributed Data Platform System Based on Hadoop Platform
    Guo, Jianwei
    Du, Liping
    Li, Ying
    Zhao, Guifen
    Jiya, Jiang
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSAIT 2013), 2014, 255 : 533 - 539
  • [36] Building Block Components to Control a Data Rate in the Apache Hadoop Compute Platform
    Van Do, Tien
    Vu, Binh T.
    Do, Nam H.
    Farkas, Lorant
    Roter, Csaba
    Tarjanyi, Tamas
    [J]. 2015 8TH INTERNATIONAL CONFERENCE ON INTELLIGENCE IN NEXT GENERATION NETWORKS, 2015, : 23 - 29
  • [37] Hadoop as Big Data Operating System - The Emerging Approach for Managing Challenges of Enterprise Big Data Platform
    Mazumdar, Sourav
    Dhar, Subhankar
    [J]. 2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015), 2015, : 499 - 504
  • [38] BIG DATA PLATFORM FOR CARDIOVASCULAR HEALTHCARE DATA STORAGE AND DATA VIEWER
    Navarro-Paredes, Cesar
    Jing, Min
    Finlay, Dewar
    McLaughlin, James
    [J]. HEART, 2019, 105 : A8 - A8
  • [39] Big Data Analysis Using Hadoop Cluster
    Saldhi, Ankita
    Goel, Abhinav
    Yadav, Dipesh
    Saldhi, Ankur
    Saksena, Dhruv
    Indu, S.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 572 - 575
  • [40] Design and Implementation of Traffic Big Data Visualization Web GIS Platform Based on Hadoop
    Qiao, Zhitao
    Mu, Chen
    Sun, Jichao
    [J]. CICTP 2020: ADVANCED TRANSPORTATION TECHNOLOGIES AND DEVELOPMENT-ENHANCING CONNECTIONS, 2020, : 3101 - 3106