Event metadata records as a testbed for scalable data mining

被引:3
|
作者
van Gemmeren, P. [1 ]
Malon, D. [1 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
关键词
D O I
10.1088/1742-6596/219/4/042057
中图分类号
O57 [原子核物理学、高能物理学];
学科分类号
070202 ;
摘要
At a data rate of 200 hertz, event metadata records ("TAGs," in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise "data mining," but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] A Scalable Quantum Key Distribution Network Testbed Using Parallel Discrete-Event Simulation
    Wu, Xiaoliang
    Zhang, Bo
    Chen, Gong
    Jin, Dong
    ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2022, 32 (02):
  • [42] A communication efficient and scalable distributed data mining for the astronomical data
    Govada, A.
    Sahay, S. K.
    ASTRONOMY AND COMPUTING, 2016, 16 : 166 - 173
  • [43] An ODMG-compatible testbed architecture for scalable management and analysis of physics data
    Malon, DM
    May, EN
    COMPUTER PHYSICS COMMUNICATIONS, 1998, 110 (1-3) : 120 - 124
  • [44] Scalable Metadata In- and Output for Multi-platform Data Annotation Applications
    Pick, Sebastian
    Gebhardt, Sascha
    Hentschel, Bernd
    Kuhlen, Torsten W.
    2015 IEEE VIRTUAL REALITY CONFERENCE (VR), 2015, : 261 - 262
  • [45] Metadata Capature and Geospatial Records
    Perkes, Elizabeth
    Speaker, Lisa
    ARCHIVING 2011: PRESERVATION STRATEGIES AND IMAGING TECHNOLOGIES FOR CULTURAL HERITAGE INSTITUTIONS AND MEMORY ORGANIZATIONS, 2011, : 125 - +
  • [46] Preservation of electronic records event history metadata (PEREHM) in malaysia government agencies: Evaluation and validation
    Bunawan A.-A.
    Nordin S.
    Haron H.
    International Journal of Simulation: Systems, Science and Technology, 2016, 17 (32):
  • [47] MetaFlow: A Scalable Metadata Lookup Service for Distributed File Systems in Data Centers
    Sun, Peng
    Wen, Yonggang
    Duong Nguyen Binh Ta
    Xie, Haiyong
    IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (02) : 203 - 216
  • [48] Adaptive Scalable Pipelines for Political Event Data Generation
    Halterman, Andrew
    Irvine, Jill
    Landis, Manar
    Jalla, Phanindra
    Liang, Yan
    Grant, Christan
    Solaimani, Mohiuddin
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2879 - 2883
  • [49] Data Bookkeeping Service 3-Providing event metadata in CMS
    Giffels, M.
    Guo, Y.
    Riley, D.
    20TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2013), PARTS 1-6, 2014, 513
  • [50] Partitioning for Scalable Complex Event Processing on Data Streams
    Saleh, Omran
    Betz, Heiko
    Sattler, Kai-Uwe
    NEW TRENDS IN DATABASE AND INFORMATION SYSTEMS II, 2015, 312 : 185 - 197