Shared Disk Big Data Analytics with Apache Hadoop

被引:0
|
作者
Mukherjee, Anirban [1 ]
Datta, Joydip [1 ]
Jorapur, Raghavendra [1 ]
Singhvi, Ravi [1 ]
Haloi, Saurav [1 ]
Akram, Wasim [1 ]
机构
[1] Symantec Corp, ICON, Pune 411021, Maharashtra, India
关键词
BigData; Hadoop; Clustered File Systems; Analytics; Cloud;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big Data is a term applied to data sets whose size is beyond the ability of traditional software technologies to capture, store, manage and process within a tolerable elapsed time. The popular assumption around Big Data analytics is that it requires internet scale scalability: over hundreds of compute nodes with attached storage. In this paper, we debate on the need of a massively scalable distributed computing platform for Big Data analytics in traditional businesses. For organizations which don't need a horizontal, internet order scalability in their analytics platform, Big Data analytics can be built on top of a traditional POSIX Cluster File Systems employing a shared storage model. In this study, we compared a widely used clustered file system: VERITAS Cluster File System (SF-CFS) with Hadoop Distributed File System (HDFS) using popular Map-reduce benchmarks like Terasort, DFS-IO and Gridmix on top of Apache Hadoop. In our experiments VxCFS could not only match the performance of HDFS, but also outperformed in many cases. This way, enterprises can fulfill their Big Data analytics need with a traditional and existing shared storage model without migrating to a different storage model in their data centers. This also includes other benefits like stability & robustness, a rich set of features and compatibility with traditional analytics applications.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Developing a big data analytics platform using Apache Hadoop Ecosystem for delivering big data services in libraries
    Singh, Ranjeet Kumar
    [J]. DIGITAL LIBRARY PERSPECTIVES, 2024, 40 (02) : 160 - 186
  • [2] Big Data Analysis using Apache Hadoop
    Manikandan, Shankar Ganesh
    Ravi, Siddarth
    [J]. 2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,
  • [3] Big data analytics on Apache Spark
    Salloum S.
    Dautov R.
    Chen X.
    Peng P.X.
    Huang J.Z.
    [J]. International Journal of Data Science and Analytics, 2016, 1 (3-4) : 145 - 164
  • [4] Big Data Software Analytics with Apache Spark
    Gousios, Georgios
    [J]. PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 542 - 543
  • [5] Moving Hadoop to the Cloud for Big Data Analytics
    Astrova, Irina
    Koschel, Arne
    Heine, Felix
    Kalja, Ahto
    [J]. DATABASES AND INFORMATION SYSTEMS X (DB&IS 2018), 2019, 315 : 195 - 209
  • [6] Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive
    Garg, Varun
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 938 - 941
  • [7] Processing of Big Educational Data in the Cloud Using Apache Hadoop
    Machova, Renata
    Komarkova, Jitka
    Lnenicka, Martin
    [J]. INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 46 - 49
  • [8] The Emerging Hadoop, Analytics, Stream Stack for Big Data
    Bernstein, David
    [J]. IEEE CLOUD COMPUTING, 2014, 1 (04): : 84 - 86
  • [9] Typhoon Quantitative Rainfall Prediction from Big Data Analytics by Using the Apache Hadoop Spark Parallel Computing Framework
    Wei, Chih-Chiang
    Chou, Tzu-Hao
    [J]. ATMOSPHERE, 2020, 11 (08)
  • [10] Optimizing Hadoop Performance for Big Data Analytics in Smart Grid
    Khan, Mukhtaj
    Huang, Zhengwen
    Li, Maozhen
    Taylor, Gareth A.
    Ashton, Phillip M.
    Khan, Mushtaq
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2017, 2017