Monitoring and control of large-scale distributed systems

被引:0
|
作者
Legrand, C. [1 ,2 ]
机构
[1] CALTECH, Pasadena, CA 91125 USA
[2] CERN, European Org Nucl Res, Geneva, Switzerland
关键词
D O I
10.3254/978-1-61499-643-9-101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An important part of managing large-scale, distributed computing systems is a monitoring service that is able to monitor and track in real-time many site facilities, networks, and tasks in progress. The monitoring information gathered is essential for developing the required higher level services, the components that provide decision support and some degree of automated decisions and for maintaining and optimizing workflow in large-scale distributed systems. Our strategy in trying to satisfy the demands of data intensive applications was to move to more synergetic relationships between the applications, computing and storage facilities and the network infrastructure. These orchestration and global optimization functions are performed by higher-level agent-based services which are able to collaborate and cooperate in performing a wide range of distributed information-gathering and processing tasks.
引用
收藏
页码:101 / 151
页数:51
相关论文
共 50 条
  • [31] Adaptation Engine for Large-Scale Distributed Systems
    Nemes, Tania
    [J]. COMPUTER AIDED SYSTEMS THEORY - EUROCAST 2015, 2015, 9520 : 244 - 251
  • [32] Analysis of large-scale distributed information systems
    Hellerstein, JL
    Jayram, TS
    Squillante, MS
    [J]. 8TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS, PROCEEDINGS, 2000, : 164 - 171
  • [33] Robustness of large-scale distributed computer systems
    Khoroshevsky, VG
    [J]. EUROSIM '96 - HPCN CHALLENGES IN TELECOMP AND TELECOM: PARALLEL SIMULATION OF COMPLEX SYSTEMS AND LARGE-SCALE APPLICATIONS, 1996, : 141 - 150
  • [34] Robust Scheduling for Large-Scale Distributed Systems
    Lee, Young Choon
    King, Jayden
    Kim, Young Ki
    Hong, Seok-Hee
    [J]. 2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 38 - 45
  • [35] Designing a Testbed for Large-scale Distributed Systems
    Leng, Christof
    Lehn, Max
    Rehner, Robert
    Buchmann, Alejandro
    [J]. ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2011, 41 (04) : 400 - 401
  • [36] Distributed Orchestration in Large-scale IoT Systems
    Yigitoglu, Emre
    Liu, Ling
    Looper, Margaret
    Pu, Calton
    [J]. 2017 IEEE 2ND INTERNATIONAL CONGRESS ON INTERNET OF THINGS (IEEE ICIOT), 2017, : 58 - 65
  • [37] Risk modeling in distributed, large-scale systems
    Grabowski, M
    Merrick, JRW
    Harrald, JR
    Mazzuchi, TA
    van Dorp, JR
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2000, 30 (06): : 651 - 660
  • [38] Legal reliability in large-scale distributed systems
    Sommer, P
    [J]. SEVENTEENTH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 1998, : 416 - 421
  • [39] Distributed Control of Large-Scale Networked Control Systems With Communication Constraints and Topology Switching
    Zhang, Dan
    Nguang, Sing Kiong
    Yu, Li
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (07): : 1746 - 1757
  • [40] Active distributed monitoring for dynamic large-scale networks
    Liotta, A
    Pavlou, G
    Knight, G
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-10, CONFERENCE RECORD, 2001, : 1544 - 1550