GPFS-SNC: An enterprise cluster file system for Big Data

被引:9
|
作者
Jain, R. [1 ]
Sarkar, P. [1 ]
Subhraveti, D. [1 ]
机构
[1] Almaden Res Ctr, IBM Res Div, San Jose, CA 95120 USA
关键词
D O I
10.1147/JRD.2013.2243531
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A new class of data-intensive applications commonly referred to as Big Data applications (e.g., customer sentiment analysis based on click-stream logs) involves processing massive amounts of data with a focus on semantically transforming the data. This class of applications is massively parallel and well suited for the MapReduce programming framework that allows users to perform large-scale data analyses such that the application execution layer handles the system architecture, data partitioning, and task scheduling. In this paper, we introduce GPFS-SNC (General Parallel File System for Shared Nothing Clusters), a scalable file system that operates over a cluster of commodity machines and direct-attached storage and meets the requirements of analytics and traditional applications that are typically used together in analytics solutions. The architecture extends an existing enterprise cluster file system to support these emerging classes of workloads by applying five innovative optimizations: 1) locality awareness to allow compute jobs to be scheduled on nodes where the data resides, 2) metablocks that allow large and small block sizes to co-exist in the same file system to meet the needs of different types of applications, 3) write affinity that allows applications to dictate the layout of files on different nodes in order to maximize both write and read bandwidth, 4) pipelined replication to maximize use of network bandwidth for data replication, and 5) distributed recovery to minimize the effect of failures on ongoing computation.
引用
下载
收藏
页数:10
相关论文
共 50 条
  • [31] Design of cluster safe file system
    Zheng, G
    Ding, K
    He, ZX
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2003, 2834 : 249 - 253
  • [32] Fast probabilistic file fingerprinting for big data
    Konstantin Tretyakov
    Sven Laur
    Geert Smant
    Jaak Vilo
    Pjotr Prins
    BMC Genomics, 14
  • [33] Research on encoding transmission of big data file
    School of Information Technology, Jiangxi University of Finance and Economics, Nanchang 330013, China
    不详
    Jisuanji Gongcheng, 2006, 19 (120-122):
  • [34] Fast probabilistic file fingerprinting for big data
    Tretyakov, Konstantin
    Laur, Sven
    Smant, Geert
    Vilo, Jaak
    Prins, Pjotr
    BMC GENOMICS, 2013, 14
  • [35] A High Performance Cluster File System with Standard Network File System Interface
    Lu, Jun
    Du, Bin
    Zhu, Yi
    Ren, Ly
    Li, Daiw
    2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 397 - +
  • [36] A novel cluster parallel file system
    Wei, Wenguo
    Dong, Shoubin
    Zhang, Ling
    Li, Jialin
    SEVENTEENTH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, : 119 - +
  • [37] Performance analysis of a cluster file system
    Du, G
    Xu, ZW
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 1933 - 1938
  • [38] Design System Construction of Intelligent Enterprise Management System Based on Big Data Technology
    Xu, Yan
    2020 5TH INTERNATIONAL CONFERENCE ON SMART GRID AND ELECTRICAL AUTOMATION (ICSGEA 2020), 2020, : 363 - 366
  • [39] A Novel Meta-Data Synchronous Write Mechanism for Cluster File System
    Lu, Jun
    Zhu, Yi
    Du, Bin
    Ren, Ly
    Li, Rui
    2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 401 - +
  • [40] ENTERPRISE CONTROLLING IN THE CONTEXT OF BIG DATA
    Lajos, Branislav
    AKTUALNE PROBLEMY PODNIKOVEJ SFERY 2015, 2015, : 341 - 347