GPFS-SNC: An enterprise cluster file system for Big Data

被引:9
|
作者
Jain, R. [1 ]
Sarkar, P. [1 ]
Subhraveti, D. [1 ]
机构
[1] Almaden Res Ctr, IBM Res Div, San Jose, CA 95120 USA
关键词
D O I
10.1147/JRD.2013.2243531
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A new class of data-intensive applications commonly referred to as Big Data applications (e.g., customer sentiment analysis based on click-stream logs) involves processing massive amounts of data with a focus on semantically transforming the data. This class of applications is massively parallel and well suited for the MapReduce programming framework that allows users to perform large-scale data analyses such that the application execution layer handles the system architecture, data partitioning, and task scheduling. In this paper, we introduce GPFS-SNC (General Parallel File System for Shared Nothing Clusters), a scalable file system that operates over a cluster of commodity machines and direct-attached storage and meets the requirements of analytics and traditional applications that are typically used together in analytics solutions. The architecture extends an existing enterprise cluster file system to support these emerging classes of workloads by applying five innovative optimizations: 1) locality awareness to allow compute jobs to be scheduled on nodes where the data resides, 2) metablocks that allow large and small block sizes to co-exist in the same file system to meet the needs of different types of applications, 3) write affinity that allows applications to dictate the layout of files on different nodes in order to maximize both write and read bandwidth, 4) pipelined replication to maximize use of network bandwidth for data replication, and 5) distributed recovery to minimize the effect of failures on ongoing computation.
引用
下载
收藏
页数:10
相关论文
共 50 条
  • [41] Private Search Over Big Data Leveraging Distributed File System and Parallel Processing
    Selcuk, Ayse
    Orencik, Cengiz
    Savas, Erkay
    CLOUD COMPUTING 2015: THE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, GRIDS, AND VIRTUALIZATION, 2015, : 116 - 121
  • [42] Big Data Performance Analysis on a Hadoop Distributed File System Based on Geometric Data Perturbation Technique
    Marichamy, V. Santhana
    Natarajan, V.
    2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 415 - 420
  • [43] PARADISE: Big data analytics using the DBMS tightly integrated with the distributed file system
    Kim, Jun-Sung
    Whang, Kyu-Young
    Kwon, Hyuk-Yoon
    Song, Il-Yeol
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2016, 19 (03): : 299 - 322
  • [44] From Google File System to Omega: a Decade of Advancement in Big Data Management at Google
    Yang, Jade
    2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015), 2015, : 249 - 255
  • [45] PARADISE: Big data analytics using the DBMS tightly integrated with the distributed file system
    Jun-Sung Kim
    Kyu-Young Whang
    Hyuk-Yoon Kwon
    Il-Yeol Song
    World Wide Web, 2016, 19 : 299 - 322
  • [46] Study on enterprise financial information management system based on big data analysis
    Zhang, Li
    International Journal of Information and Communication Technology, 2024, 25 (03) : 195 - 207
  • [47] Based on Data Mining and Big Data Intelligent System in Enterprise Cost Accounting Optimization Application
    Wang, Wenyan
    Guo, Jie
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [48] TransCrypt: an Enterprise Encrypting File System over NFS
    Khoje, Abhay
    Salih, K. A.
    Moona, Rajat
    WORLD CONGRESS ON ENGINEERING 2009, VOLS I AND II, 2009, : 509 - 514
  • [49] Big Data: Mining of Log File through Hadoop
    Kotiyal, Bina
    Kumar, Ankit
    Pant, Bhaskar
    Goudar, R. H.
    2013 INTERNATIONAL CONFERENCE ON HUMAN COMPUTER INTERACTIONS (ICHCI), 2013,
  • [50] Design and implementation of Cooperative Cluster File System
    Hwang, IC
    Lim, D
    Maeng, SR
    Cho, JW
    PDPTA '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2005, : 270 - 276