GPFS-SNC: An enterprise cluster file system for Big Data

被引:9
|
作者
Jain, R. [1 ]
Sarkar, P. [1 ]
Subhraveti, D. [1 ]
机构
[1] Almaden Res Ctr, IBM Res Div, San Jose, CA 95120 USA
关键词
D O I
10.1147/JRD.2013.2243531
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A new class of data-intensive applications commonly referred to as Big Data applications (e.g., customer sentiment analysis based on click-stream logs) involves processing massive amounts of data with a focus on semantically transforming the data. This class of applications is massively parallel and well suited for the MapReduce programming framework that allows users to perform large-scale data analyses such that the application execution layer handles the system architecture, data partitioning, and task scheduling. In this paper, we introduce GPFS-SNC (General Parallel File System for Shared Nothing Clusters), a scalable file system that operates over a cluster of commodity machines and direct-attached storage and meets the requirements of analytics and traditional applications that are typically used together in analytics solutions. The architecture extends an existing enterprise cluster file system to support these emerging classes of workloads by applying five innovative optimizations: 1) locality awareness to allow compute jobs to be scheduled on nodes where the data resides, 2) metablocks that allow large and small block sizes to co-exist in the same file system to meet the needs of different types of applications, 3) write affinity that allows applications to dictate the layout of files on different nodes in order to maximize both write and read bandwidth, 4) pipelined replication to maximize use of network bandwidth for data replication, and 5) distributed recovery to minimize the effect of failures on ongoing computation.
引用
下载
收藏
页数:10
相关论文
共 50 条
  • [21] A Load-Aware Data Placement Policy on Cluster File System
    Wang, Yu
    Xing, Jing
    Xiong, Jin
    Meng, Dan
    NETWORK AND PARALLEL COMPUTING, 2011, 6985 : 17 - 31
  • [22] Policy of file migration at server in cluster file system
    Shu, JW
    Wang, B
    Zheng, WM
    Deng, YY
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID - CCGRID 2004, 2004, : 691 - 698
  • [23] Big Enterprise in a Competitive System
    Stocking, George W.
    JOURNAL OF POLITICAL ECONOMY, 1955, 63 (02) : 174 - 175
  • [24] BIG ENTERPRISE IN A COMPETITIVE SYSTEM
    Edwards, Corwin D.
    UNIVERSITY OF PENNSYLVANIA LAW REVIEW, 1955, 103 (07) : 991 - 998
  • [25] Big Enterprise in a Competitive System
    Hewett, W. W.
    ANNALS OF THE AMERICAN ACADEMY OF POLITICAL AND SOCIAL SCIENCE, 1955, 300 : 145 - 146
  • [26] Big Enterprise in a Competitive System
    Phillips, Almarin
    ACCOUNTING REVIEW, 1955, 30 (03): : 564 - 566
  • [27] Big Enterprise in a Competitive System
    Knapp, Joseph G.
    JOURNAL OF FARM ECONOMICS, 1955, 37 (01): : 158 - 161
  • [28] Big Enterprise in a Competitive System
    Markham, Jesse W.
    AMERICAN ECONOMIC REVIEW, 1955, 45 (03): : 448 - 451
  • [29] Modeling of distributed file System in big data storage by event-B
    Ali, Ammar Alhaj
    Varacha, Pavel
    Krayem, Said
    Jasek, Roman
    Zacek, Petr
    Chramcov, Bronislav
    22ND INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, COMMUNICATIONS AND COMPUTERS (CSCC 2018), 2018, 210
  • [30] The Research and Implementation of File Information Retrieval System Based on Big Data Semantic
    Zhu, Zebo
    Lin, Baochuan
    PROCEEDINGS OF THE ADVANCES IN MATERIALS, MACHINERY, ELECTRICAL ENGINEERING (AMMEE 2017), 2017, 114 : 527 - 531