GPFS-SNC: An enterprise cluster file system for Big Data

被引:9
|
作者
Jain, R. [1 ]
Sarkar, P. [1 ]
Subhraveti, D. [1 ]
机构
[1] Almaden Res Ctr, IBM Res Div, San Jose, CA 95120 USA
关键词
D O I
10.1147/JRD.2013.2243531
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A new class of data-intensive applications commonly referred to as Big Data applications (e.g., customer sentiment analysis based on click-stream logs) involves processing massive amounts of data with a focus on semantically transforming the data. This class of applications is massively parallel and well suited for the MapReduce programming framework that allows users to perform large-scale data analyses such that the application execution layer handles the system architecture, data partitioning, and task scheduling. In this paper, we introduce GPFS-SNC (General Parallel File System for Shared Nothing Clusters), a scalable file system that operates over a cluster of commodity machines and direct-attached storage and meets the requirements of analytics and traditional applications that are typically used together in analytics solutions. The architecture extends an existing enterprise cluster file system to support these emerging classes of workloads by applying five innovative optimizations: 1) locality awareness to allow compute jobs to be scheduled on nodes where the data resides, 2) metablocks that allow large and small block sizes to co-exist in the same file system to meet the needs of different types of applications, 3) write affinity that allows applications to dictate the layout of files on different nodes in order to maximize both write and read bandwidth, 4) pipelined replication to maximize use of network bandwidth for data replication, and 5) distributed recovery to minimize the effect of failures on ongoing computation.
引用
下载
收藏
页数:10
相关论文
共 50 条
  • [1] GPFS-SNC: An enterprise storage framework for virtual-machine clouds
    Gupta, K.
    Jain, R.
    Koltsidas, I.
    Pucha, H.
    Sarkar, P.
    Seaman, M.
    Subhraveti, D.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2011, 55 (06)
  • [2] GPFS: A shared-disk file system for large computing clusters
    Schmuck, F
    Baskin, R
    USENIX ASSOCIATION PROCEEDINGS OF THE FAST'02 CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2002, : 231 - 244
  • [3] Hadoop Distributed File System for Big data analysis
    Almansouri, Hatim Talal
    Masmoudi, Youssef
    PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19), 2019, : 257 - 261
  • [4] Methods of enterprise electronic file content information mining under big data environment
    Peng, Fang
    Wang, Honggang
    Zhuang, Li
    Wang, Minnan
    Yang, Chengyue
    2020 INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2020), 2020, : 5 - 8
  • [5] XPMFS: A New NVM File System for Vehicle Big Data
    Niu, Dejiao
    He, Qingjian
    Cai, Tao
    Chen, Bo
    Zhan, Yongzhao
    Liang, Jun
    IEEE ACCESS, 2018, 6 : 34863 - 34873
  • [6] Big Data Management System for the harmonization of enterprise model
    Sabitha, M. S.
    Viayalakshmi, S.
    Sre, R. M. Rathikaa
    2016 INTERNATIONAL CONFERENCE ON COMPUTING TECHNOLOGIES AND INTELLIGENT DATA ENGINEERING (ICCTIDE'16), 2016,
  • [7] Improving data availability for a cluster file system through replication
    Xiong, Jin
    Li, Jianyu
    Tang, Rongfeng
    Hu, Yiming
    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 306 - +
  • [8] GDedup: Distributed File System Level Deduplication for Genomic Big Data
    Bartus, Paul
    Arzuaga, Emmanuel
    2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 120 - 127
  • [9] Computer Performance Determination System Based on Big Data Distributed File
    Lu, Kong
    CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 877 - 884
  • [10] An approach for Big Data Security based on Hadoop Distributed File system
    Mahmoud, Hadeer
    Hegazy, Abdelfatah
    Khafagy, Mohamed H.
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMPUTER ENGINEERING (ITCE' 2018), 2018, : 109 - 114