Event metadata records as a testbed for scalable data mining

被引:3
|
作者
van Gemmeren, P. [1 ]
Malon, D. [1 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
关键词
D O I
10.1088/1742-6596/219/4/042057
中图分类号
O57 [原子核物理学、高能物理学];
学科分类号
070202 ;
摘要
At a data rate of 200 hertz, event metadata records ("TAGs," in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise "data mining," but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] ELECTRONIC HEALTH RECORDS DATA AND METADATA: Challenges for Big Data in the United States
    Sweet, Lauren E.
    Moulaison, Heather Lea
    BIG DATA, 2013, 1 (04) : BD245 - BD251
  • [22] ADAM:: A testbed for exploring the use of data mining in intrusion detection
    Barbará, D
    Couto, J
    Jajodia, S
    Wu, NN
    SIGMOD RECORD, 2001, 30 (04) : 15 - 24
  • [23] An Intelligent Archive Testbed Incorporating Data Mining - Lessons and Observations
    Ramapriyan, H.
    Isaac, D.
    Yang, W.
    Bonnlander, B.
    Danks, D.
    2006 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-8, 2006, : 3482 - +
  • [24] Scalable STAR Array Testbed
    Wolfe, Pierre-Francois W.
    Kolodziej, Kenneth E.
    2022 IEEE INTERNATIONAL SYMPOSIUM ON PHASED ARRAY SYSTEMS & TECHNOLOGY (PAST), 2022,
  • [25] Research on a New Metadata Model of Political Event Data Set
    Wang, Biao
    Zhang, Yiwei
    Wang, Ding
    2018 IEEE 4TH INTERNATIONAL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY), 4THIEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, (HPSC) AND 3RD IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2018, : 14 - 19
  • [26] Large scale data mining to improve usability of data - an intelligent archive testbed
    Ramapriyan, H
    Isaac, D
    Yang, WL
    Morse, S
    IGARSS 2005: IEEE International Geoscience and Remote Sensing Symposium, Vols 1-8, Proceedings, 2005, : 5618 - 5621
  • [27] Scalable parallel data mining for association rules
    Han, EH
    Karypis, G
    Kumar, V
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2000, 12 (03) : 337 - 352
  • [28] Scalable and Efficient Data Analytics and Mining with Lemonade
    dos Santos, Walter
    Avelar, Gustavo P.
    Ribeiro, Manoel Horta
    Guedes, Dorgival
    Meira Jr, Wagner
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 2070 - 2073
  • [29] Scalable management and data mining using astrolabe
    van Renesse, R
    Birman, K
    Dumitriu, D
    Vogels, W
    PEER-TO-PEER SYSTEMS, 2002, 2429 : 280 - 294
  • [30] Compiler and middleware support for scalable data mining
    Agrawal, G
    Jin, RM
    Li, XG
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2003, 2624 : 33 - 51