A Framework for Multitasking Data-Intensive Management Services in High Performance Computing Environments

被引:0
|
作者
Kulasekaran, Sivakumar [1 ,3 ]
Esteva, Maria [1 ,3 ]
Trelogan, Jessica [2 ,3 ]
Liu, Si [1 ,3 ]
机构
[1] Texas Adv Comp Ctr, Houston, TX 77054 USA
[2] Inst Class Archaeol, London, England
[3] Univ Texas Austin, Austin, TX 78712 USA
关键词
Multitasking data management services; high performance computing; archaeology data; data intensive computing;
D O I
10.1109/BigDataService.2015.42
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data management entails a continuum of tasks to develop sustainable and reusable collections throughout their lifecycle. Large collections with complex data formats and structures may require what we define as "multitasking data management," involving a combination of manual and automated iterative tasks. When conducted in a desktop computing environment by curators, these tasks can be labor-intensive and disruptive of research. While the process can be made much more efficient within a Data-Intensive High Performance Computing (DIC/HPC) infrastructure, it remains a challenge to implement generalizable services so that automated workflows can be easily performed by non-expert users. This paper introduces a framework for automating data management activities as data-intensive computing jobs within a multitasking workflow. Using as a case study a set of legacy data from an archaeological collection in need of reorganization, we identified the steps required to re-sort and move approximately 27,000 data files into a structured collection architecture. Because not all data management workflows are the same, and because there are a wide range of requirements for job submission within data-intensive HPC resources, we derived a set of generalizable modules that can be used as a guide for curators and HPC consultants. This framework may accommodate collections with different data types and data management requirements and can be conducted by curators trained in HPC usage but without ample computational expertise. Upon testing, we implemented the framework as a service on a DIC/HPC cluster.
引用
收藏
页码:333 / 340
页数:8
相关论文
共 50 条
  • [1] Data-intensive workflow management: For clouds and data-intensive and scalable computing environments
    De Oliveira, Daniel C.M.
    Liu, Ji
    Pacitti, Esther
    Synthesis Lectures on Data Management, 2019, 14 (04): : 1 - 179
  • [2] An Inter-Framework Cache for Diverse Data-Intensive Computing Environments
    Wang, Chun-Yu
    Huang, Tzu-En
    Huang, Yu-Tang
    Chang, Jyh-Biau
    Shieh, Ce-Kuen
    2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY), 2015, : 944 - 949
  • [3] INDEMICS: An Interactive High-Performance Computing Framework for Data-Intensive Epidemic Modeling
    Bisset, Keith R.
    Chen, Jiangzhuo
    Deodhar, Suruchi
    Feng, Xizhou
    Ma, Yifei
    Marathe, Madhav V.
    ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2014, 24 (01):
  • [4] Data classification algorithm for data-intensive computing environments
    Chen, Tiedong
    Liu, Shifeng
    Gong, Daqing
    Gao, Honghu
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2017,
  • [5] Data classification algorithm for data-intensive computing environments
    Tiedong Chen
    Shifeng Liu
    Daqing Gong
    Honghu Gao
    EURASIP Journal on Wireless Communications and Networking, 2017
  • [6] A Framework for Data-Intensive Computing with Cloud Bursting
    Bicer, Tekin
    Chiu, David
    Agrawal, Gagan
    2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 169 - 177
  • [7] Parallel Framework for Data-Intensive Computing with XSEDE
    Subramanian, Ranjini
    Zhang, Hui
    PEARC '19: PROCEEDINGS OF THE PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING ON RISE OF THE MACHINES (LEARNING), 2019,
  • [8] Hardware technologies for high-performance data-intensive computing
    Gokhale, Maya
    Cohen, Jonathan
    Yoo, Andy
    Miller, W. Marcus
    Jacob, Arpith
    Ulmer, Craig
    Pearce, Roger
    COMPUTER, 2008, 41 (04) : 60 - +
  • [9] A New Data Classification Algorithm for Data-Intensive Computing Environments
    Deng, Qizhi
    Zhang, Longbo
    Qian, Xin
    Chen, Yali
    Wang, Fengying
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION APPLICATIONS (ICCIA 2012), 2012, : 1351 - 1354
  • [10] Power Management of Online Data-Intensive Services
    Meisner, David
    Sadler, Christopher M.
    Barroso, Luiz Andre
    Weber, Wolf-Dietrich
    Wenisch, Thomas F.
    ISCA 2011: PROCEEDINGS OF THE 38TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 2011, : 319 - 330