Analysis of Big Data Platform with OpenStack and Hadoop

被引:5
|
作者
Li, Xiaoyan [1 ]
Lu, Zhihui [1 ]
Wang, Nini [2 ]
Wu, Jie [2 ]
Huang, Shalin [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] Minist Educ, Engn Res Ctr Cyber Secur Auditing & Monitoring, Shanghai 200433, Peoples R China
[3] Wangsu Sci & Technol Co Ltd, Shanghai 200433, Peoples R China
来源
关键词
Hadoop; Benchmarks; Big data; HDFS; Cluster; Openstack; Cloud; PERFORMANCE; MAPREDUCE;
D O I
10.1007/978-3-319-49178-3_29
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the era of big data, the cloud infrastructure needs to strongly support big data. As a distributed computational framework, Hadoop is one of the de facto leading software tools for solving big data problems. The cloud infrastructure has been proven to be a good support for three-tier architecture applications. In this paper, we construct a Hadoop big data platform based on OpenStack cloud. At the same time, we design three experimental scenarios, carry out a set of experiments using the standard Hadoop benchmarks TestDFSIO, TeraSort and PI, and examine the performance. Our experiments reveal that the disk read operation of physical servers can be a bottleneck for TestDFSIO and TeraSort. Wider allocation of VMs over physical servers achieves better performance for read jobs of TestDFSIO and TeraSort. For CPU-intensive job PI, the best practice is to centralize the allocation of VMs over physical machines.
引用
收藏
页码:375 / 390
页数:16
相关论文
共 50 条
  • [41] Distributed Case-based Reasoning System Based on Big Data Platform Hadoop
    Wang, Chong-Yang
    Wang, Hong-Bing
    Liang, Yan-Rui
    [J]. 2015 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND INFORMATION SYSTEM (SEIS 2015), 2015, : 629 - 634
  • [42] Big data and Spark: Comparison with Hadoop
    Benlachmi, Yassine
    Hasnaoui, Moulay Lahcen
    [J]. PROCEEDINGS OF THE 2020 FOURTH WORLD CONFERENCE ON SMART TRENDS IN SYSTEMS, SECURITY AND SUSTAINABILITY (WORLDS4 2020), 2020, : 811 - 817
  • [43] Handling Big Data with Hadoop Toolkit
    Devakunchari, R.
    [J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [44] Big Data and Hadoop -A Technological Survey
    Manwal, Manika
    Gupta, Amit
    [J]. 2017 INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN COMPUTING AND COMMUNICATION TECHNOLOGIES (ICETCCT), 2017, : 268 - 273
  • [45] Hadoop: Addressing Challenges of Big Data
    Singh, Kamalpreet
    Kaur, Ravinder
    [J]. SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 686 - 689
  • [46] A Review on Big Data and Hadoop Security
    Khaloufi, Hayat
    Beni-Hssane, Abderrahim
    Abouelmehdi, Karim
    Saadi, Mostafa
    [J]. Networked Systems, NETYS 2016, 2016, 9944 : 386 - 386
  • [47] Role of Hadoop in Big Data Handling
    Meenakshi
    Ramachandra, A. C.
    Thippeswamy, M. N.
    Bailakare, Ajith
    [J]. INTERNATIONAL CONFERENCE ON INTELLIGENT DATA COMMUNICATION TECHNOLOGIES AND INTERNET OF THINGS, ICICI 2018, 2019, 26 : 482 - 491
  • [48] Distributed Data Platform System Based on Hadoop Platform
    Guo, Jianwei
    Du, Liping
    Li, Ying
    Zhao, Guifen
    Jiya, Jiang
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSAIT 2013), 2014, 255 : 533 - 539
  • [49] Human Resource Decision-Making and Recommendation Based on Hadoop Distributed Big Data Platform
    Chen, Weiling
    Du, Chunjing
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [50] DESIGN AND IMPLEMENTATION OF VEHICLE SCHEDULING OPTIMIZATION FOR SMART LOGISTICS PLATFORM POWERED BY HADOOP BIG DATA
    Yu, Guangtian
    Yu, Wangtianhua
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2023, 24 (04): : 755 - 767