Event metadata records as a testbed for scalable data mining

被引:3
|
作者
van Gemmeren, P. [1 ]
Malon, D. [1 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
关键词
D O I
10.1088/1742-6596/219/4/042057
中图分类号
O57 [原子核物理学、高能物理学];
学科分类号
070202 ;
摘要
At a data rate of 200 hertz, event metadata records ("TAGs," in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise "data mining," but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Data mining in metadata repositories
    Arotaritei, D
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY IV, 2002, 4730 : 69 - 76
  • [2] Incorporating metadata into data mining with ontology
    Li, Guoqi
    Shenw, Huanye
    Fan, Xun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (06): : 983 - 985
  • [3] Scalable visualization of event data
    Taylor, DJ
    Halim, N
    Hellerstein, JL
    Ma, S
    SERVICES MANAGEMENT IN INTELLIGENT NETWORKS, PROCEEDINGS, 2000, 1960 : 47 - 58
  • [4] Scalable Mining of Big Data
    Leung, Carson K.
    Pazdor, Adam G. M.
    Zheng, Hao
    2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 240 - 247
  • [5] A Scalable Complex Event Analytical System with Incremental Episode Mining over Data Streams
    Tseng, Jerry C. C.
    Gu, Jia-Yuan
    Tseng, Vincent S.
    Wang, P. F.
    Chen, Ching-Yu
    Li, Chu-Feng
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 648 - 655
  • [6] Attribute metadata for relational OLAP and data mining
    Merrett, TH
    DATABASE PROGRAMMING LANGUAGES, 2002, 2397 : 97 - 118
  • [7] Mining Building Metadata by Data Stream Comparison
    Holmegaard, Emil
    Kjaergaard, Mikkel Baun
    2016 IEEE CONFERENCE ON TECHNOLOGIES FOR SUSTAINABILITY (SUSTECH), 2016,
  • [8] An extensible infrastructure for querying and mining event-level metadata in ATLAS
    Malon, D.
    Cranshaw, J.
    Zhang, Q.
    INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS 2012 (CHEP2012), PTS 1-6, 2012, 396
  • [9] Scalable feature mining for sequential data
    Lesh, Neal
    Zaki, Mohammed J.
    Ogihara, Mitsunori
    IEEE Intelligent Systems and Their Applications, 2000, 15 (02): : 48 - 56
  • [10] A scalable data mining architecture for bioinformation
    Li, R
    Zhang, Z
    Cao, S
    Zhu, Y
    Li, Y
    DATA MINING IV, 2004, 7 : 583 - 592