Event metadata records as a testbed for scalable data mining

被引:3
|
作者
van Gemmeren, P. [1 ]
Malon, D. [1 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
关键词
D O I
10.1088/1742-6596/219/4/042057
中图分类号
O57 [原子核物理学、高能物理学];
学科分类号
070202 ;
摘要
At a data rate of 200 hertz, event metadata records ("TAGs," in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise "data mining," but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Scalable, Reliable and Robust Data Mining Infrastructures
    Pawar, Shrikant
    Stanam, Aditya
    PROCEEDINGS OF THE 2020 FOURTH WORLD CONFERENCE ON SMART TRENDS IN SYSTEMS, SECURITY AND SUSTAINABILITY (WORLDS4 2020), 2020, : 123 - 125
  • [32] On Scalable Data Mining Techniques for Earth Science
    Goetz, Markus
    Richerzhagen, Matthias
    Bodenstein, Christian
    Cavallaro, Gabriele
    Glock, Philipp
    Riedel, Morris
    Benediktsson, Jon Atli
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 : 2188 - 2197
  • [33] Parallel scalable infrastructure for OLAP and data mining
    Northwestern Univ, Evanston, United States
    Proc Int Database Eng Appl Symp, (178-186):
  • [34] AITION: A Scalable Platform for Interactive Data Mining
    Dimitropoulos, Harry
    Kllapi, Herald
    Metaxas, Omiros
    Oikonomidis, Nikolas
    Sitaridi, Eva
    Tsangaris, Manolis M.
    Ioannidis, Yannis
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2012, 2012, 7338 : 646 - 651
  • [35] Meta-MapReduce for scalable data mining
    Liu X.
    Wang X.
    Matwin S.
    Japkowicz N.
    J. Big Data, 1 (1):
  • [36] SPRINT: A scalable parallel classifier for data mining
    Shafer, J
    Agrawal, R
    Mehta, M
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1996, : 544 - 555
  • [37] Scalable parallel clustering for data mining on multicomputers
    Foti, D
    Lipari, D
    Pizzuti, C
    Talia, D
    PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 2000, 1800 : 390 - 398
  • [38] New challenges and roles of metadata in text/data mining in statistics
    Soltés, D
    Knowledge Mining, 2005, 185 : 191 - 199
  • [39] Mining the Web for generating thematic metadata from textual data
    Huang, CC
    Chuang, SL
    Chien, LF
    20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 834 - 834
  • [40] Scalable metadata definition frameworks
    Plante, R
    TOWARD AN INTERNATIONAL VIRTUAL OBSERVATORY, 2004, : 106 - 111