Modeling and Optimizing Large-Scale Wide-Area Data Transfers

被引:11
|
作者
Kettimuthu, Rajkumar [1 ,2 ]
Vardoyan, Gayane [1 ]
Agrawal, Gagan [2 ]
Sadayappan, P. [2 ]
机构
[1] Argonne Natl Lab, Math & Comp Sci Div, Argonne, IL 60439 USA
[2] Ohio State Univ, Comp Sci & Engn, Columbus, OH 43210 USA
关键词
wide-area data transfer; GridFTP; modeling data transfer; BANDWIDTH ALLOCATION;
D O I
10.1109/CCGrid.2014.114
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data generated by experimental, simulation, and observational science is growing exponentially. The resulting datasets are often transported over wide-area networks for storage, analysis, or visualization. Network bandwidth, which is not increasing at the same rate as dataset sizes, is becoming a key obstacle to data-driven sciences. In this paper, we focus on how bandwidth allocation can be controlled at the level of a protocol such as GridFTP, in view of goals such as maintaining certain priorities or performing scheduling with specified objectives. In particular, we explore how GridFTP transfer performance can be controlled by using parallelism and concurrency. We find that concurrency turns out to be a more powerful control knob than is parallelism. For a source where most bandwidth is consumed by transfers to a small number of other destinations, we build a model for each destination's achieved throughput in terms of its concurrency and total concurrency (over GridFTP transfers) to other major destinations. We then enhance this model by including an indicator of the time-varying external load, using multiple ways to measure this external load. We study the effectiveness of the proposed models in controlling the bandwidth allocation. After evaluating the numerous combinations of models and methods of measuring external load, we narrow in on the four best-performing ones, based on both their validation results and their applicability. After extensive testing of these four approaches, we find that they can obtain desired bandwidth allocations with a mean(median) error rate of 19.8%(13.8%), with 38% of the errors in our benchmark tests being less than 10% and 54% of them being less than 15%.
引用
收藏
页码:196 / 205
页数:10
相关论文
共 50 条
  • [1] Large Transfers for Data Analytics on Shared Wide-Area Networks
    Anvari, Hamidreza
    Lu, Paul
    [J]. PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 418 - 423
  • [2] AutoVFlow: Virtualization of large-scale wide-area OpenFlow networks
    Yamanaka, Hiroaki
    Kawai, Eiji
    Shimojo, Shinji
    [J]. COMPUTER COMMUNICATIONS, 2017, 102 : 28 - 46
  • [3] Modeling and optimizing large-scale data flows
    Woehrer, Alexander
    Brezany, Peter
    Janciak, Ivan
    Mehofer, Eduard
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 31 : 12 - 27
  • [4] Optimizing Shuffle in Wide-Area Data Analytics
    Liu, Shuhao
    Wang, Hao
    Li, Baochun
    [J]. 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 560 - 571
  • [5] Reliable Wide-Area Data Transfers for Streaming Workflows
    Sapkota, Hemanta
    Arslan, Engin
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3546 - 3557
  • [6] The Impact of Large-Data Transfers in Shared Wide-Area Networks: An Empirical Study
    Anvari, Hamidreza
    Lu, Paul
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 1702 - 1711
  • [7] Identification and Wide-area Visualization of the Centers of Oscillation for a Large-scale Power System
    Bernal, Leonardo E.
    Hu, Fengkai
    Sun, Kai
    Farantatos, Evangelos
    [J]. 2014 IEEE PES GENERAL MEETING - CONFERENCE & EXPOSITION, 2014,
  • [8] Optimizing Large Data Transfers over 100Gbps Wide Area Networks
    Rajendran, Anupam
    Mhashilkar, Parag
    Kim, Hyunwoo
    Dykstra, Dave
    Garzoglio, Gabriele
    Raicu, Ioan
    [J]. PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 26 - 33
  • [9] Pixida: Optimizing Data Parallel Jobs in Wide-Area Data Analytics
    Kloudas, Konstantinos
    Mamede, Margarida
    Preguica, Nuno
    Rodrigues, Rodrigo
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 9 (02): : 72 - 83
  • [10] Facilitating Community Risk Communication for Wide-Area Evacuation during Large-Scale Floods
    Suzuki, Takeyasu
    Watanabe, Takanori
    Okuyama, Shin'ichiro
    [J]. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2019, 16 (14)