Evaluating Scalable Distributed Erlang for Scalability and Reliability

被引:7
|
作者
Chechina, Natalia [1 ]
MacKenzie, Kenneth [1 ]
Thompson, Simon [3 ]
Trinder, Phil [2 ]
Boudeville, Olivier [4 ]
Fordos, Viktoria [5 ]
Hoch, Csaba [5 ]
Ghaffari, Amir [1 ]
Hernandez, Mario Moro [1 ]
机构
[1] Univ Glasgow, Glasgow G12 8QQ, Lanark, Scotland
[2] Univ Glasgow, Comp Sci, Glasgow G12 8QQ, Lanark, Scotland
[3] Univ Kent, Sch Comp, Log & Computat, Canterbury CT2 7NZ, Kent, England
[4] EDF R&D, SINETICS Dept, F-92141 Clamart, France
[5] Erlang Solut AB, H-1093 Budapest, Hungary
基金
英国工程与自然科学研究理事会;
关键词
Scalability; reliability; actors; Erlang;
D O I
10.1109/TPDS.2017.2654246
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large scale servers with hundreds of hosts and tens of thousands of cores are becoming common. To exploit these platforms software must be both scalable and reliable, and distributed actor languages like Erlang are a proven technology in this area. While distributed Erlang conceptually supports the engineering of large scale reliable systems, in practice it has some scalability limits that force developers to depart from the standard language mechanisms at scale. In earlier work we have explored these scalability limitations, and addressed them by providing a Scalable Distributed (SD) Erlang library that partitions the network of Erlang Virtual Machines (VMs) into scalable groups (s_groups). This paper presents the first systematic evaluation of SD Erlang s_groups and associated tools, and how they can be used. We present a comprehensive evaluation of the scalability and reliability of SD Erlang using three typical benchmarks and a case study. We demonstrate that s_groups improve the scalability of reliable and unreliable Erlang applications on up to 256 hosts (6,144 cores). We show that SD Erlang preserves the class-leading distributed Erlang reliability model, but scales far better than the standard model. We present a novel, systematic, and tool-supported approach for refactoring distributed Erlang applications into SD Erlang. We outline the new and improved monitoring, debugging and deployment tools for large scale SD Erlang applications. We demonstrate the scaling characteristics of key tools on systems comprising up to 10 K Erlang VMs.
引用
收藏
页码:2244 / 2257
页数:14
相关论文
共 50 条
  • [1] Scaling Reliably: Improving the Scalability of the Erlang Distributed Actor Platform
    Trinder, Phil
    Chechina, Natalia
    Papaspyrou, Nikolaos
    Sagonas, Konstantinos
    Thompson, Simon
    Adams, Stephen
    Aronis, Stavros
    Baker, Robert
    Bihari, E. V. A.
    Boudeville, Olivier
    Cesarini, Francesco
    Di Stefano, Maurizio
    Eriksson, Sverker
    Fordos, Viktoria
    Ghaffari, Amir
    Giantsios, Aggelos
    Green, Rickard
    Hoch, Csaba
    Klaftenegger, David
    Li, Huiqing
    Lundin, Kenneth
    Mackenzie, Kenneth
    Roukounaki, Katerina
    Tsiouris, Yiannis
    Winblad, Kjell
    [J]. ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2017, 39 (04):
  • [2] Evaluating the scalability of distributed systems
    Jogalekar, P
    Woodside, M
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2000, 11 (06) : 589 - 603
  • [3] Evaluating the scalability of distributed systems
    Jogalekar, P
    Woodside, M
    [J]. PROCEEDINGS OF THE THIRTY-FIRST HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOL VII: SOFTWARE TECHNOLOGY TRACK, 1998, : 524 - 531
  • [4] Scalability and reliability in a distributed search engine
    Sato, N
    Udagawa, M
    Uehara, M
    Sakai, Y
    Mori, H
    [J]. NINTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 2002, : 57 - 62
  • [5] Improving the network scalability of Erlang
    Chechina, Natalia
    Li, Huiqing
    Ghaffari, Amir
    Thompson, Simon
    Trinder, Phil
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 90-91 : 22 - 34
  • [6] Reliability with erlang
    Vinoski, Steve
    [J]. IEEE INTERNET COMPUTING, 2007, 11 (06) : 79 - 81
  • [7] Evaluating Distributed Computation Offloading Scalability for Multiple Robots
    Ayoub, Fatima
    Villing, Rudi
    [J]. 2023 EIGHTH INTERNATIONAL CONFERENCE ON FOG AND MOBILE EDGE COMPUTING, FMEC, 2023, : 72 - 79
  • [8] Evaluating the Performance and Scalability of the Ceph Distributed Storage System
    Gudu, Diana
    Hardt, Marcus
    Streit, Achim
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 177 - 182
  • [9] EVALUATING PROJECT COMPLETION TIMES WHEN ACTIVITY TIMES ARE ERLANG DISTRIBUTED
    BENDELL, A
    SOLOMON, D
    CARTER, JM
    [J]. JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 1995, 46 (07) : 867 - 882
  • [10] A Method of Evaluating Distributed Storage System Reliability
    Huang, Hongjie
    Yang, Shuqiang
    [J]. TRUSTWORTHY COMPUTING AND SERVICES (ISCTCS 2014), 2015, 520 : 189 - 196