Shared Disk Big Data Analytics with Apache Hadoop

被引:0
|
作者
Mukherjee, Anirban [1 ]
Datta, Joydip [1 ]
Jorapur, Raghavendra [1 ]
Singhvi, Ravi [1 ]
Haloi, Saurav [1 ]
Akram, Wasim [1 ]
机构
[1] Symantec Corp, ICON, Pune 411021, Maharashtra, India
关键词
BigData; Hadoop; Clustered File Systems; Analytics; Cloud;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big Data is a term applied to data sets whose size is beyond the ability of traditional software technologies to capture, store, manage and process within a tolerable elapsed time. The popular assumption around Big Data analytics is that it requires internet scale scalability: over hundreds of compute nodes with attached storage. In this paper, we debate on the need of a massively scalable distributed computing platform for Big Data analytics in traditional businesses. For organizations which don't need a horizontal, internet order scalability in their analytics platform, Big Data analytics can be built on top of a traditional POSIX Cluster File Systems employing a shared storage model. In this study, we compared a widely used clustered file system: VERITAS Cluster File System (SF-CFS) with Hadoop Distributed File System (HDFS) using popular Map-reduce benchmarks like Terasort, DFS-IO and Gridmix on top of Apache Hadoop. In our experiments VxCFS could not only match the performance of HDFS, but also outperformed in many cases. This way, enterprises can fulfill their Big Data analytics need with a traditional and existing shared storage model without migrating to a different storage model in their data centers. This also includes other benefits like stability & robustness, a rich set of features and compatibility with traditional analytics applications.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science
    Akil, Bilal
    Zhou, Ying
    Roehm, Uwe
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 303 - 310
  • [42] A Practical approach for integrating Big data Analytics into E-governance using hadoop
    Jadhav, Bhushan
    Patankar, Archana B.
    Jadhav, Sonali B.
    [J]. PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 1952 - 1958
  • [43] A Solution to Combat Cyber security Threats Involving Big Data Analytics in the Hadoop Ecosystem
    Lnenicka, Martin
    Capek, Jan
    Komarkova, Jitka
    Machova, Renata
    Cermakova, Ivana
    [J]. VISION 2020: SUSTAINABLE ECONOMIC DEVELOPMENT, INNOVATION MANAGEMENT, AND GLOBAL GROWTH, VOLS I-IX, 2017, 2017, : 1804 - 1812
  • [44] Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark
    Ketu, Shwet
    Mishra, Pramod Kumar
    Agarwal, Sonali
    [J]. COMPUTACION Y SISTEMAS, 2020, 24 (02): : 669 - 686
  • [45] Creating Large Size of Data with Apache Hadoop
    Ruzicka, Jan
    Kocich, David
    Orcik, Lukas
    Svozilik, Vladislav
    [J]. RISE OF BIG SPATIAL DATA, 2017, : 307 - 314
  • [46] Big Data Analytics:Predicting Academic Course Preference Using Hadoop Inspired MapReduce
    Guleria, Pratiyush
    Sood, Manu
    [J]. 2017 FOURTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2017, : 328 - 331
  • [47] Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark
    Hafez, Manar Mohamed
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    Hegazy, Abd El Ftah Abdel Ghfar
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 692 - 704
  • [48] PerTract: Model Extraction and Specification of Big Data Systems for Performance Prediction by the Example of Apache Spark and Hadoop
    Kross, Johannes
    Krcmar, Helmut
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2019, 3 (03) : 1 - 24
  • [49] An Efficient Approach to Extract and Store Big Semantic Web Data Using Hadoop and Apache Spark GraphX
    Mohammed, Wria Mohammed Salih
    Maa, Alaa Khalil Ju
    [J]. ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2024, 13
  • [50] An insight into tree based machine learning techniques for big data Analytics using Apache Spark
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1740 - 1743