Shared Disk Big Data Analytics with Apache Hadoop

被引：0

作者：

Mukherjee, Anirban ^{[1
]}

Datta, Joydip ^{[1
]}

Jorapur, Raghavendra ^{[1
]}

Singhvi, Ravi ^{[1
]}

Haloi, Saurav ^{[1
]}

Akram, Wasim ^{[1
]}

机构：

[1] Symantec Corp, ICON, Pune 411021, Maharashtra, India

来源：

2012 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) | 2012年

关键词：

BigData; Hadoop; Clustered File Systems; Analytics; Cloud;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Big Data is a term applied to data sets whose size is beyond the ability of traditional software technologies to capture, store, manage and process within a tolerable elapsed time. The popular assumption around Big Data analytics is that it requires internet scale scalability: over hundreds of compute nodes with attached storage. In this paper, we debate on the need of a massively scalable distributed computing platform for Big Data analytics in traditional businesses. For organizations which don't need a horizontal, internet order scalability in their analytics platform, Big Data analytics can be built on top of a traditional POSIX Cluster File Systems employing a shared storage model. In this study, we compared a widely used clustered file system: VERITAS Cluster File System (SF-CFS) with Hadoop Distributed File System (HDFS) using popular Map-reduce benchmarks like Terasort, DFS-IO and Gridmix on top of Apache Hadoop. In our experiments VxCFS could not only match the performance of HDFS, but also outperformed in many cases. This way, enterprises can fulfill their Big Data analytics need with a traditional and existing shared storage model without migrating to a different storage model in their data centers. This also includes other benefits like stability & robustness, a rich set of features and compatibility with traditional analytics applications.

引用

页数：6

共 50 条

[1] Developing a big data analytics platform using Apache Hadoop Ecosystem for delivering big data services in libraries
Singh, Ranjeet Kumar
[J]. DIGITAL LIBRARY PERSPECTIVES, 2024, 40 (02) : 160 - 186
[2] Big Data Analysis using Apache Hadoop
Manikandan, Shankar Ganesh
Ravi, Siddarth
[J]. 2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,
[3] Big data analytics on Apache Spark
Salloum S.
Dautov R.
Chen X.
Peng P.X.
Huang J.Z.
[J]. International Journal of Data Science and Analytics, 2016, 1 (3-4) : 145 - 164
[4] Big Data Software Analytics with Apache Spark
Gousios, Georgios
[J]. PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 542 - 543
[5] Moving Hadoop to the Cloud for Big Data Analytics
Astrova, Irina
Koschel, Arne
Heine, Felix
Kalja, Ahto
[J]. DATABASES AND INFORMATION SYSTEMS X (DB&IS 2018), 2019, 315 : 195 - 209
[6] Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive
Garg, Varun
[J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 938 - 941
[7] Processing of Big Educational Data in the Cloud Using Apache Hadoop
Machova, Renata
Komarkova, Jitka
Lnenicka, Martin
[J]. INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 46 - 49
[8] The Emerging Hadoop, Analytics, Stream Stack for Big Data
Bernstein, David
[J]. IEEE CLOUD COMPUTING, 2014, 1 (04): : 84 - 86
[9] Typhoon Quantitative Rainfall Prediction from Big Data Analytics by Using the Apache Hadoop Spark Parallel Computing Framework
Wei, Chih-Chiang
Chou, Tzu-Hao
[J]. ATMOSPHERE, 2020, 11 (08)
[10] Optimizing Hadoop Performance for Big Data Analytics in Smart Grid
Khan, Mukhtaj
Huang, Zhengwen
Li, Maozhen
Taylor, Gareth A.
Ashton, Phillip M.
Khan, Mushtaq
[J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2017, 2017

← 1 2 3 4 5 →