Scaling Reliably: Improving the Scalability of the Erlang Distributed Actor Platform

被引:5
|
作者
Trinder, Phil [1 ]
Chechina, Natalia [1 ]
Papaspyrou, Nikolaos [2 ]
Sagonas, Konstantinos [3 ,4 ]
Thompson, Simon [5 ]
Adams, Stephen [5 ]
Aronis, Stavros [3 ]
Baker, Robert [5 ]
Bihari, E. V. A. [6 ]
Boudeville, Olivier [7 ,9 ]
Cesarini, Francesco [6 ]
Di Stefano, Maurizio [5 ]
Eriksson, Sverker [8 ]
Fordos, Viktoria [6 ]
Ghaffari, Amir [1 ]
Giantsios, Aggelos [2 ]
Green, Rickard [8 ]
Hoch, Csaba [6 ]
Klaftenegger, David [3 ]
Li, Huiqing [5 ]
Lundin, Kenneth [8 ]
Mackenzie, Kenneth [1 ]
Roukounaki, Katerina [2 ]
Tsiouris, Yiannis [2 ]
Winblad, Kjell [3 ]
机构
[1] Univ Glasgow, Sch Comp Sci, Glasgow G12 8QQ, Lanark, Scotland
[2] Natl Tech Univ Athens, Sch Elect & Comp Engn, Polytechnioupoli, Athens 15780, Greece
[3] Uppsala Univ, Dept Informat Technol, Box 337, SE-75105 Uppsala, Sweden
[4] Natl Tech Univ Athens, Athens, Greece
[5] Univ Kent, Sch Comp, Canterbury CT2 7NF, Kent, England
[6] Erlang Solut, 14 Gowers Walk, London E1 8PY, England
[7] Electricite France, Paris, France
[8] Ericsson AB, S-16483 Kista, Sweden
[9] EDF R&D, F-92140 Clamart, France
基金
英国工程与自然科学研究理事会;
关键词
Erlang; scalability; reliability;
D O I
10.1145/3107937
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Distributed actor languages are an effective means of constructing scalable reliable systems, and the Erlang programming language has a well-established and influential model. While the Erlang model conceptually provides reliable scalability, it has some inherent scalability limits and these force developers to depart from the model at scale. This article establishes the scalability limits of Erlang systems and reports the work of the EU RELEASE project to improve the scalability and understandability of the Erlang reliable distributed actor model. We systematically study the scalability limits of Erlang and then address the issues at the virtual machine, language, and tool levels. More specifically: (1) We have evolved the Erlang virtual machine so that it canwork effectively in large-scale single-host multicore and NUMA architectures. We have made important changes and architectural improvements to the widely used Erlang/OTP release. (2) We have designed and implemented Scalable Distributed (SD) Erlang libraries to address language-level scalability issues and provided and validated a set of semantics for the new language constructs. (3) To make large Erlang systems easier to deploy, monitor, and debug, we have developed and made open source releases of five complementary tools, some specific to SD Erlang. Throughout the article we use two case studies to investigate the capabilities of our new technologies and tools: a distributed hash table based Orbit calculation and Ant Colony Optimisation (ACO). Chaos Monkey experiments show that two versions of ACO survive random process failure and hence that SD Erlang preserves the Erlang reliability model. While we report measurements on a range of NUMA and cluster architectures, the key scalability experiments are conducted on the Athos cluster with 256 hosts (6,144 cores). Even for programs with no global recovery data to maintain, SD Erlang partitions the network to reduce network traffic and hence improves performance of the Orbit and ACO benchmarks above 80 hosts. ACO measurements show that maintaining global recovery data dramatically limits scalability; however, scalability is recovered by partitioning the recovery data. We exceed the established scalability limits of distributed Erlang, and do not reach the limits of SD Erlang for these benchmarks at this scale (256 hosts, 6,144 cores).
引用
收藏
页数:46
相关论文
共 22 条
  • [1] Improving the network scalability of Erlang
    Chechina, Natalia
    Li, Huiqing
    Ghaffari, Amir
    Thompson, Simon
    Trinder, Phil
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 90-91 : 22 - 34
  • [2] Evaluating Scalable Distributed Erlang for Scalability and Reliability
    Chechina, Natalia
    MacKenzie, Kenneth
    Thompson, Simon
    Trinder, Phil
    Boudeville, Olivier
    Fordos, Viktoria
    Hoch, Csaba
    Ghaffari, Amir
    Hernandez, Mario Moro
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (08) : 2244 - 2257
  • [3] PARTISAN: Scaling the Distributed Actor Runtime
    Meiklejohn, Christopher S.
    Miller, Heather
    Alvaro, Peter
    [J]. PROCEEDINGS OF THE 2019 USENIX ANNUAL TECHNICAL CONFERENCE, 2019, : 63 - 76
  • [4] Improving the Scalability of Distributed Network Emulations: An Algorithmic Perspective
    Zhao, Huaiyi
    Zhang, Xinyi
    Wang, Yang
    Diao, Zulong
    Li, Yanbiao
    Xie, Gaogang
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2023, 20 (04): : 4325 - 4339
  • [5] An on-line scaling method for improving scalability of a database cluster
    JANG Yong ll
    LEE Chung ho
    LEE Jae dong
    BAE Hae young
    [J]. 重庆邮电大学学报(自然科学版), 2004, (05) : 71 - 77
  • [6] Improving the Speed and Scalability of Distributed Simulations of Sensor Networks
    Jin, Zhong-Yi
    Gupta, Rajesh
    [J]. 2009 INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING IN SENSOR NETWORKS (IPSN 2009), 2009, : 169 - 180
  • [7] A Distributed Process Management Model for Better Scalability on Multicore Platform
    CHENG Zhonghan
    ZHU Runshen
    CHEN Peng
    HUANG Hao
    [J]. Chinese Journal of Electronics, 2017, 26 (02) : 263 - 270
  • [8] A Distributed Process Management Model for Better Scalability on Multicore Platform
    Cheng Zhonghan
    Zhu Runshen
    Chen Peng
    Huang Hao
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2017, 26 (02) : 263 - 270
  • [9] Setchain: Improving Blockchain Scalability with Byzantine Distributed Sets and Barriers
    Capretto, Margarita
    Ceresa, Martin
    Fernandez Anta, Antonio
    Russo, Antonio
    Sanchez, Cesar
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON BLOCKCHAIN (BLOCKCHAIN 2022), 2022, : 87 - 96