Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework

被引:4
|
作者
Colombo, Tommaso [1 ,2 ]
Froening, Holger [2 ]
Javier Garcia, Pedro [3 ]
Vandelli, Wainer [1 ]
机构
[1] CERN, Dept Phys, Geneva, Switzerland
[2] Heidelberg Univ, Inst Tech Informat ZITI, Mannheim, Germany
[3] Univ Castilla La Mancha, Dept Sistemas Informat, Albacete, Spain
来源
JOURNAL OF SUPERCOMPUTING | 2016年 / 72卷 / 12期
关键词
Data acquisition; Network; Data collection; Latency; ATLAS; Incast; Ethernet; TCP;
D O I
10.1007/s11227-016-1764-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The ATLAS detector at CERN records particle collision "events" delivered by the Large Hadron Collider. Its data-acquisition system identifies, selects, and stores interesting events in near real-time, with an aggregate throughput of several 10 GB/s. It is a distributed software system executed on a farm of roughly 2000 commodity worker nodes communicating via TCP/IP on an Ethernet network. Event data fragments are received from the many detector readout channels and are buffered, collected together, analyzed and either stored permanently or discarded. This system, and data-acquisition systems in general, are sensitive to the latency of the data transfer from the readout buffers to theworker nodes. Challenges affecting this transfer include the many-to-one communication pattern and the inherently bursty nature of the traffic. The main performance issues brought about by this workload are addressed in this paper, focusing in particular on the so-called TCP incast pathology. Since performing systematic studies of these issues is often impeded by operational constraints related to themission-critical nature of these systems, we developed a simulationmodel of the ATLAS data-acquisition system. The resulting simulation tool is based on the well-established, widely-used OMNeT++ framework. This tool was successfully validated by comparing the obtained simulation results with existing measurements of the system's behavior. Furthermore, the simulation tool enables the study of the theoretical behavior of the system in numerous what-if scenarios and with modifications that are not immediately applicable to the real system. In this paper, we take advantage of this to analyze the behavior of the system using different traffic shaping and scheduling policies, and with network hardware modifications. This analysis leads to conclusions that could be used to devise future system enhancements.
引用
收藏
页码:4546 / 4572
页数:27
相关论文
共 50 条
  • [1] Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework
    Tommaso Colombo
    Holger Fröning
    Pedro Javier Garcìa
    Wainer Vandelli
    [J]. The Journal of Supercomputing, 2016, 72 : 4546 - 4572
  • [2] DATA-COLLECTION AND DISPLAY SYSTEM FOR A LARGE-SCALE SIMULATION
    BROWN, AE
    SCANDALE, JS
    SPARROW, DP
    PHILLIPS, CE
    [J]. COMPUTER JOURNAL, 1972, 15 (02): : 105 - &
  • [3] LARGE-SCALE DATA-ACQUISITION SYSTEMS
    ZIGLER, GL
    [J]. EXPERIMENTAL MECHANICS, 1978, 18 (05) : N40 - N40
  • [4] A SYSTEM FOR LARGE-SCALE IMAGE MAPPING AND GIS DATA-COLLECTION
    PRIES, RA
    [J]. PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 1995, 61 (05): : 503 - 511
  • [5] Modeling a Large Data-Acquisition Network in a Simulation Framework
    Colombo, Tommaso
    Froening, Holger
    Garcia, Pedro Javier
    Vandelli, Wainer
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 809 - 816
  • [6] EXPERIENCE WITH GENERATION OUTAGE DATA-COLLECTION PROCEDURES ON A LARGE-SCALE SYSTEM
    CUCCHI, GA
    PRATZON, DJ
    WITMER, FP
    [J]. IEEE TRANSACTIONS ON POWER APPARATUS AND SYSTEMS, 1976, 95 (02): : 542 - 549
  • [7] Buffer Provisioning for Large-Scale Data-Acquisition Systems
    Santos, Alejandro
    Vandelli, Wainer
    Javier Garcia, Pedro
    Froening, Holger
    [J]. DEBS'18: PROCEEDINGS OF THE 12TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, 2018, : 100 - 111
  • [8] Large-scale Windows 95-based data-acquisition system using LabVIEW
    Mandrake, L
    Gekelman, W
    [J]. COMPUTERS IN PHYSICS, 1997, 11 (05): : 498 - 507
  • [9] Doctoral Symposium: Buffering Strategies for Large-Scale Data-Acquisition Systems
    Santos, Alejandro
    [J]. DEBS'18: PROCEEDINGS OF THE 12TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, 2018, : 270 - 273
  • [10] A new large-scale data acquisition system
    Qin, JM
    Muller-Schloer, C
    Brehm, J
    [J]. ICEMI '97 - CONFERENCE PROCEEDINGS: THIRD INTERNATIONAL CONFERENCE ON ELECTRONIC MEASUREMENT & INSTRUMENTS, 1997, : 299 - 302