Lessons learned from development and operation of the K computer

被引:2
|
作者
Shoji, Fumiyoshi [1 ]
机构
[1] RIKEN AICS, Operat & Comp Technol Div, Chuo Ku, 7-1-26,Minatojima Minami Machi, Kobe, Hyogo, Japan
关键词
The K computer; Operation improvement; Failure analysis; Parallel file system;
D O I
10.1016/j.parco.2017.03.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We report operational experiences of the K computer which is one of the most powerful supercomputers in the world. The K computer achieved excellent results for system availability, job-filling rate and failure rate. On the other hand, approximately 70% of the unscheduled system stop time was caused by file system failures. We analyzed the reasons for the failures and found that a massive and complex system configuration of the file system is one of the crucial factors for the failures. It revealed many potential bugs in the file system software, and such bugs caused many failures which gave severe impacts to the operation. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:12 / 19
页数:8
相关论文
共 50 条
  • [1] Lessons learned from the K computer project-from the K computer to Exascale-
    Oyanagi, Yoshio
    [J]. 15TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH (ACAT2013), 2014, 523
  • [2] Lessons Learned during the Development and Operation of Virtual Observatory
    Ohishi, Masatoshi
    Shirasaki, Yuji
    Komiya, Yutaka
    Mizumoto, Yoshihiko
    Yasuda, Naoki
    Tanaka, Masahiro
    [J]. ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XIX, 2010, 434 : 73 - +
  • [3] Development and operation of an airborne VHF SAR system - Lessons learned
    Gustavsson, A
    Ulander, LMH
    Flood, B
    Frolind, PO
    Hellsten, H
    Jonsson, T
    Larsson, B
    Stenstrom, G
    [J]. IGARSS '98 - 1998 INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, PROCEEDINGS VOLS 1-5: SENSING AND MANAGING THE ENVIRONMENT, 1998, : 458 - 462
  • [4] The Herschel Space Observatory Development, Operation and Postoperations: Lessons Learned
    Pilbratt, Goran
    Griffin, Matt
    Barthel, Peter
    Cernicharo, Jose
    de Graauw, Thijs
    Encrenaz, Pierre
    Fischer, Jacqueline
    Garcia-Lario, Pedro
    Harvey, Paul
    Harwit, Martin
    Helmich, Frank
    Poglitsch, Albrecht
    Sturm, Eckhard
    Vigroux, Laurent
    Waelkens, Christoffel
    [J]. SPACE TELESCOPES AND INSTRUMENTATION 2020: OPTICAL, INFRARED, AND MILLIMETER WAVE, 2021, 11443
  • [5] Lessons learned: Operation anaconda
    Midla, GS
    [J]. MILITARY MEDICINE, 2004, 169 (10) : 810 - 813
  • [6] Lessons learned from infrastructure operation in the CUTE project
    Stolzenburg, K.
    Tsatsami, V.
    Grubel, H.
    [J]. INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2009, 34 (16) : 7114 - 7124
  • [7] Lessons learned from the development of imatinib
    Lydon, NB
    Druker, BJ
    [J]. LEUKEMIA RESEARCH, 2004, 28 : S29 - S38
  • [8] Lessons learned from development of docetaxel
    Strother, Robert Matthew
    Sweeney, Christopher
    [J]. EXPERT OPINION ON DRUG METABOLISM & TOXICOLOGY, 2008, 4 (07) : 1007 - 1019
  • [9] Clinical Pathways Development and Computer Support in the EPR: Lessons learned
    Buerkle, Thomas
    Baur, Thomas
    Hoess, Norbert
    [J]. UBIQUITY: TECHNOLOGIES FOR BETTER HEALTH IN AGING SOCIETIES, 2006, 124 : 1025 - +
  • [10] Smart sensors - Lessons learned from computer vision
    Yates, R
    Meikle, S
    [J]. SMART STRUCTURES AND MATERIALS 2000: SMART ELECTRONICS AND MEMS, 2000, 3990 : 95 - 102