Managing the Hidden Costs of Coordination

被引:2
|
作者
Maguire, Laura M. D. [1 ,2 ]
机构
[1] Ohio State Univ, Cognit Syst Engn Lab, Columbus, OH 43210 USA
[2] SNAFU Catchers Consortium, Columbus, OH USA
关键词
D O I
10.1145/3379989
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
IT STARTED WITH 502 errors. Almost immediately a flood of user reports swamped the service's community Slack channel. A user posted "Getting 502s?" at 9:22 A.M., and within minutes 40 other users responded with the Yes and MeToo emojis. Also at 9:22 A.M., in an ops channel, an incident had been opened by an on-call engineer, and the site reliability engineers responsible for the service had been paged out. By 9:23 A.M. five responders were checking logs and dashboards. At 9:25 A.M.-less than two minutes after an initial tentative question indicated there may be an issue-the first notification was pushed out to users. This was aimed at slowing the influx of user reports from the 77,000-plus user community.
引用
收藏
页码:90 / 96
页数:7
相关论文
共 50 条