Towards an automated data cleaning with deep learning in CRESST

被引:0
|
作者
G. Angloher
S. Banik
D. Bartolot
G. Benato
A. Bento
A. Bertolini
R. Breier
C. Bucci
J. Burkhart
L. Canonica
A. D’Addabbo
S. Di Lorenzo
L. Einfalt
A. Erb
F. v. Feilitzsch
N. Ferreiro Iachellini
S. Fichtinger
D. Fuchs
A. Fuss
A. Garai
V. M. Ghete
S. Gerster
P. Gorla
P. V. Guillaumon
S. Gupta
D. Hauff
M. Ješkovský
J. Jochum
M. Kaznacheeva
A. Kinast
H. Kluck
H. Kraus
M. Lackner
A. Langenkämper
M. Mancuso
L. Marini
L. Meyer
V. Mokina
A. Nilima
M. Olmi
T. Ortmann
C. Pagliarone
L. Pattavina
F. Petricca
W. Potzel
P. Povinec
F. Pröbst
F. Pucci
F. Reindl
D. Rizvanovic
机构
[1] Max-Planck-Institut für Physik,Faculty of Mathematics, Physics and Informatics
[2] Institut für Hochenergiephysik der Österreichischen Akademie der Wissenschaften,Department of Physics
[3] Atominstitut,LIBPhys
[4] Technische Universität Wien,UC, Departamento de Fisica
[5] INFN,Dipartimento di Ingegneria Civile e Meccanica
[6] Laboratori Nazionali del Gran Sasso,undefined
[7] Comenius University,undefined
[8] Physik-Department,undefined
[9] Technische Universität München,undefined
[10] Eberhard-Karls-Universität Tübingen,undefined
[11] University of Oxford,undefined
[12] Universidade de Coimbra,undefined
[13] Walther-Meißner-Institut für Tieftemperaturforschung,undefined
[14] GSSI-Gran Sasso Science Institute,undefined
[15] Universitá degli Studi di Cassino e del Lazio Meridionale,undefined
关键词
D O I
暂无
中图分类号
学科分类号
摘要
The CRESST experiment employs cryogenic calorimeters for the sensitive measurement of nuclear recoils induced by dark matter particles. The recorded signals need to undergo a careful cleaning process to avoid wrongly reconstructed recoil energies caused by pile-up and read-out artefacts. We frame this process as a time series classification task and propose to automate it with neural networks. With a data set of over one million labeled records from 68 detectors, recorded between 2013 and 2019 by CRESST, we test the capability of four commonly used neural network architectures to learn the data cleaning task. Our best performing model achieves a balanced accuracy of 0.932 on our test set. We show on an exemplary detector that about half of the wrongly predicted events are in fact wrongly labeled events, and a large share of the remaining ones have a context-dependent ground truth. We furthermore evaluate the recall and selectivity of our classifiers with simulated data. The results confirm that the trained classifiers are well suited for the data cleaning task.
引用
收藏
相关论文
共 50 条
  • [1] Towards an automated data cleaning with deep learning in CRESST
    Angloher, G.
    Banik, S.
    Bartolot, D.
    Benato, G.
    Bento, A.
    Bertolini, A.
    Breier, R.
    Bucci, C.
    Burkhart, J.
    Canonica, L.
    D'Addabbo, A.
    Di Lorenzo, S.
    Einfalt, L.
    Erb, A.
    Feilitzsch, F. V.
    Iachellini, N. Ferreiro
    Fichtinger, S.
    Fuchs, D.
    Fuss, A.
    Garai, A.
    Ghete, V. M.
    Gerster, S.
    Gorla, P.
    Guillaumon, P. V.
    Gupta, S.
    Hauff, D.
    Jeskovsky, M.
    Jochum, J.
    Kaznacheeva, M.
    Kinast, A.
    Kluck, H.
    Kraus, H.
    Lackner, M.
    Langenkaemper, A.
    Mancuso, M.
    Marini, L.
    Meyer, L.
    Mokina, V.
    Nilima, A.
    Olmi, M.
    Ortmann, T.
    Pagliarone, C.
    Pattavina, L.
    Petricca, F.
    Potzel, W.
    Povinec, P.
    Proebst, F.
    Pucci, F.
    Reindl, F.
    Rizvanovic, D.
    [J]. EUROPEAN PHYSICAL JOURNAL PLUS, 2023, 138 (01):
  • [2] Towards Automated Melanoma Detection with Deep Learning: Data Purification and Augmentation
    Bisla, Devansh
    Choromanska, Anna
    Berman, Russell S.
    Stein, Jennifer A.
    Polsky, David
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 2720 - 2728
  • [3] Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning
    Zha, Daochen
    Lai, Kwei-Herng
    Tan, Qiaoyu
    Ding, Sirui
    Zou, Na
    Hu, Xia Ben
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2476 - 2485
  • [4] ReClean: Reinforcement Learning for Automated Data Cleaning in ML Pipelines
    Abdelaal, Mohamed
    Yayak, Anil Bora
    Klede, Kai
    Schoening, Harald
    [J]. 2024 IEEE 40TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, ICDEW, 2024, : 324 - 330
  • [5] Towards Automated Tuberculosis detection using Deep Learning
    Kant, Sonaal
    Srivastava, Muktabh Mayank
    [J]. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1250 - 1253
  • [6] ImageDC: Image Data Cleaning Framework Based on Deep Learning
    Zhang, Yun
    Jin, Zongze
    Liu, Fan
    Zhu, Weilin
    Mu, Weimin
    Wang, Weiping
    [J]. PROCEEDINGS OF 2020 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS), 2020, : 748 - 752
  • [7] ModelKB: Towards Automated Management of the Modeling Lifecycle in Deep Learning
    Gharibi, Gharib
    Walunj, Vijay
    Rella, Sirisha
    Lee, Yugyung
    [J]. 2019 IEEE/ACM 7TH INTERNATIONAL WORKSHOP ON REALIZING ARTIFICIAL INTELLIGENCE SYNERGIES IN SOFTWARE ENGINEERING (RAISE 2019), 2019, : 28 - 34
  • [8] Towards automated extraction for terrestrial laser scanning data of building components based on panorama and deep learning
    Li, Dongsheng
    Liu, Jiepeng
    Feng, Liang
    Cheng, Guozhong
    Zeng, Yan
    Dong, Biqin
    Chen, Y. . Frank
    [J]. JOURNAL OF BUILDING ENGINEERING, 2022, 50
  • [9] Towards an automated method to assess data portals in the deep web
    Correa, Andreiwid Sheffer
    de Souza, Raul Mendes
    Correa da Silva, Flavio Soares
    [J]. GOVERNMENT INFORMATION QUARTERLY, 2019, 36 (03) : 412 - 426
  • [10] Deep echocardiography: data-efficient supervised and semi-supervised deep learning towards automated diagnosis of cardiac disease
    Ali Madani
    Jia Rui Ong
    Anshul Tibrewal
    Mohammad R. K. Mofrad
    [J]. npj Digital Medicine, 1