Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence

被引:0
|
作者
Capuccini, Marco [1 ,2 ,3 ]
Carlsson, Lars [4 ]
Norinder, Ulf [5 ]
Spjuth, Ola [1 ,2 ]
机构
[1] Uppsala Univ, Dept Pharmaceut Biosci, SE-75124 Uppsala, Sweden
[2] Uppsala Univ, Sci Life Lab, SE-75124 Uppsala, Sweden
[3] Uppsala Univ, Dept Informat Technol, SE-75105 Uppsala, Sweden
[4] AstraZeneca R&D, Molndal, Sweden
[5] Swedish Toxicol Sci Res Ctr, SE-15136 Soedertaelje, Sweden
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Increasing size of datasets is challenging for machine learning, and Big Data frameworks, such as Apache Spark, have shown promise for facilitating model building on distributed resources. Conformal prediction is a mathematical framework that allows to assign valid confidence levels to object-specific predictions. This contrasts to current best-practices where the overall confidence level for predictions on unseen objects is estimated based on previous performance, assuming exchangeability. Here we report a Spark-based distributed implementation of conformal prediction, which introduces valid confidence estimation in predictive modeling for Big Data analytics. Experimental results on two large-scale datasets show the validity and the scalabilty of the method, which is freely available as open source.
引用
收藏
页码:61 / 67
页数:7
相关论文
共 50 条
  • [41] Large-Scale Strategic Games and Adversarial Machine Learning
    Alpcan, Tansu
    Rubinstein, Benjamin I. P.
    Leckie, Christopher
    [J]. 2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 4420 - 4426
  • [42] Dynamic Control Flow in Large-Scale Machine Learning
    Yu, Yuan
    Abadi, Martin
    Barham, Paul
    Brevdo, Eugene
    Burrows, Mike
    Davis, Andy
    Dean, Jeff
    Ghemawat, Sanjay
    Harley, Tim
    Hawkins, Peter
    Isard, Michael
    Kudlur, Manjunath
    Monga, Rajat
    Murray, Derek
    Zheng, Xiaoqiang
    [J]. EUROSYS '18: PROCEEDINGS OF THE THIRTEENTH EUROSYS CONFERENCE, 2018,
  • [43] Large-Scale Machine Learning Approaches for Molecular Biophysics
    Ramanathan, Arvind
    Chennubhotla, Chakra S.
    Agarwal, Pratul K.
    Stanley, Christopher B.
    [J]. BIOPHYSICAL JOURNAL, 2015, 108 (02) : 370A - 370A
  • [44] Large-Scale Machine Learning at Verizon: Theory and Applications
    Srivastava, Ashok
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 417 - 417
  • [45] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    [J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [46] Compressed linear algebra for large-scale machine learning
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    [J]. VLDB JOURNAL, 2018, 27 (05): : 719 - 744
  • [47] Quick extreme learning machine for large-scale classification
    Albtoush, Audi
    Fernandez-Delgado, Manuel
    Cernadas, Eva
    Barro, Senen
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (08): : 5923 - 5938
  • [48] Angel: a new large-scale machine learning system
    Jie Jiang
    Lele Yu
    Jiawei Jiang
    Yuhong Liu
    Bin Cui
    [J]. National Science Review, 2018, 5 (02) : 216 - 236
  • [49] An Incremental Learning framework for Large-scale CTR Prediction
    Katsileros, Petros
    Mandilaras, Nikiforos
    Mallis, Dimitrios
    Pitsikalis, Vassilis
    Theodorakis, Stavros
    Chamiel, Gil
    [J]. PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 490 - 493
  • [50] Large-scale prediction of activity cliffs using machine and deep learning methods of increasing complexity
    Shunsuke Tamura
    Tomoyuki Miyao
    Jürgen Bajorath
    [J]. Journal of Cheminformatics, 15