Introducing distributed dynamic data-intensive (D3) science: Understanding applications and infrastructure

被引:5
|
作者
Jha, Shantenu [1 ]
Katz, Daniel S. [2 ]
Luckow, Andre [1 ]
Hong, Neil Chue [3 ]
Rana, Omer [4 ]
Simmhan, Yogesh [5 ]
机构
[1] Rutgers State Univ, New Brunswick, NJ 08901 USA
[2] Univ Illinois, Champaign, IL USA
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[4] Cardiff Univ, Cardiff, S Glam, Wales
[5] Indian Inst Sci, Bengaluru, Karnataka, India
来源
基金
美国国家科学基金会;
关键词
dynamic; distributed; data intensive; scientific applications; PROJECT;
D O I
10.1002/cpe.4032
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A common feature across many science and engineering applications is the amount and diversity of data and computation that must be integrated to yield insights. Datasets are growing larger and becoming distributed; their location, availability, and properties are often time-dependent. Collectively, these characteristics give rise to dynamic distributed data-intensive applications. While "static" data applications have received significant attention, the characteristics, requirements, and software systems for the analysis of large volumes of dynamic, distributed data, and data-intensive applications have received relatively less attention. This paper surveys several representative dynamic distributed data-intensive application scenarios, provides a common conceptual framework to understand them, and examines the infrastructure used in support of applications.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Understanding performance of distributed data-intensive applications
    Miceli, Christopher
    Miceli, Michael
    Rodriguez-Milla, Bety
    Jha, Shantenu
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2010, 368 (1926): : 4089 - 4102
  • [2] Protocols and services for distributed data-intensive science
    Allcock, W
    Foster, I
    Tuecke, S
    Chervenak, A
    Kesselman, C
    ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH, 2001, 583 : 161 - 163
  • [3] Data Grids: a new computational infrastructure for data-intensive science
    Avery, P
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2002, 360 (1795): : 1191 - 1209
  • [4] Citus: Distributed PostgreSQL for Data-Intensive Applications
    Cubukcu, Umur
    Erdogan, Ozgun
    Pathak, Sumedh
    Sannakkayala, Sudhakar
    Slot, Marco
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2490 - 2502
  • [5] Globus toolkit support for distributed data-intensive science
    Allcock, W
    Chervenak, A
    Foster, I
    Pearlman, L
    Welch, V
    Wilde, M
    PROCEEDINGS OF CHEP 2001, 2001, : 692 - 695
  • [6] 3D Flash Memory for Data-intensive Applications
    Inaba, Satoshi
    2018 IEEE 10TH INTERNATIONAL MEMORY WORKSHOP (IMW), 2018, : 1 - 4
  • [7] CoLoc: Distributed Data and Container Colocation for Data-Intensive Applications
    Renner, Thomas
    Thamsen, Lauritz
    Kao, Odej
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3008 - 3015
  • [8] NSM: A distributed storage architecture for data-intensive applications
    Ali, Z
    Malluhi, Q
    20TH IEEE/11TH NASA GODDARD CONFERENCE ON MASS STORAGE AND TECHNOLOGIES (MSST 2003), PROCEEDINGS, 2003, : 87 - 91
  • [9] Decoupling computation and data scheduling in distributed data-intensive applications
    Ranganathan, K
    Foster, I
    11TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, 2002, : 352 - 358
  • [10] MapReduce Across Distributed Clusters for Data-intensive Applications
    Wang, Lizhe
    Tao, Jie
    Marten, Holger
    Streit, Achim
    Khan, Samee U.
    Kolodziej, Joanna
    Chen, Dan
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2004 - 2011