Shared Disk Big Data Analytics with Apache Hadoop

被引：0

作者：

Mukherjee, Anirban ^{[1
]}

Datta, Joydip ^{[1
]}

Jorapur, Raghavendra ^{[1
]}

Singhvi, Ravi ^{[1
]}

Haloi, Saurav ^{[1
]}

Akram, Wasim ^{[1
]}

机构：

[1] Symantec Corp, ICON, Pune 411021, Maharashtra, India

来源：

2012 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) | 2012年

关键词：

BigData; Hadoop; Clustered File Systems; Analytics; Cloud;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Big Data is a term applied to data sets whose size is beyond the ability of traditional software technologies to capture, store, manage and process within a tolerable elapsed time. The popular assumption around Big Data analytics is that it requires internet scale scalability: over hundreds of compute nodes with attached storage. In this paper, we debate on the need of a massively scalable distributed computing platform for Big Data analytics in traditional businesses. For organizations which don't need a horizontal, internet order scalability in their analytics platform, Big Data analytics can be built on top of a traditional POSIX Cluster File Systems employing a shared storage model. In this study, we compared a widely used clustered file system: VERITAS Cluster File System (SF-CFS) with Hadoop Distributed File System (HDFS) using popular Map-reduce benchmarks like Terasort, DFS-IO and Gridmix on top of Apache Hadoop. In our experiments VxCFS could not only match the performance of HDFS, but also outperformed in many cases. This way, enterprises can fulfill their Big Data analytics need with a traditional and existing shared storage model without migrating to a different storage model in their data centers. This also includes other benefits like stability & robustness, a rich set of features and compatibility with traditional analytics applications.

引用

页数：6

共 50 条

[41] On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science
Akil, Bilal
Zhou, Ying
Roehm, Uwe
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 303 - 310
[42] A Practical approach for integrating Big data Analytics into E-governance using hadoop
Jadhav, Bhushan
Patankar, Archana B.
Jadhav, Sonali B.
[J]. PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 1952 - 1958
[43] A Solution to Combat Cyber security Threats Involving Big Data Analytics in the Hadoop Ecosystem
Lnenicka, Martin
Capek, Jan
Komarkova, Jitka
Machova, Renata
Cermakova, Ivana
[J]. VISION 2020: SUSTAINABLE ECONOMIC DEVELOPMENT, INNOVATION MANAGEMENT, AND GLOBAL GROWTH, VOLS I-IX, 2017, 2017, : 1804 - 1812
[44] Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark
Ketu, Shwet
Mishra, Pramod Kumar
Agarwal, Sonali
[J]. COMPUTACION Y SISTEMAS, 2020, 24 (02): : 669 - 686
[45] Creating Large Size of Data with Apache Hadoop
Ruzicka, Jan
Kocich, David
Orcik, Lukas
Svozilik, Vladislav
[J]. RISE OF BIG SPATIAL DATA, 2017, : 307 - 314
[46] Big Data Analytics:Predicting Academic Course Preference Using Hadoop Inspired MapReduce
Guleria, Pratiyush
Sood, Manu
[J]. 2017 FOURTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2017, : 328 - 331
[47] Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark
Hafez, Manar Mohamed
Shehab, Mohamed Elemam
El Fakharany, Essam
Hegazy, Abd El Ftah Abdel Ghfar
[J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 692 - 704
[48] PerTract: Model Extraction and Specification of Big Data Systems for Performance Prediction by the Example of Apache Spark and Hadoop
Kross, Johannes
Krcmar, Helmut
[J]. BIG DATA AND COGNITIVE COMPUTING, 2019, 3 (03) : 1 - 24
[49] An Efficient Approach to Extract and Store Big Semantic Web Data Using Hadoop and Apache Spark GraphX
Mohammed, Wria Mohammed Salih
Maa, Alaa Khalil Ju
[J]. ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2024, 13
[50] An insight into tree based machine learning techniques for big data Analytics using Apache Spark
Sheshasaayee, Ananthi
Lakshmi, J. V. N.
[J]. 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1740 - 1743

← 1 2 3 4 5 →