Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence

被引:0
|
作者
Capuccini, Marco [1 ,2 ,3 ]
Carlsson, Lars [4 ]
Norinder, Ulf [5 ]
Spjuth, Ola [1 ,2 ]
机构
[1] Uppsala Univ, Dept Pharmaceut Biosci, SE-75124 Uppsala, Sweden
[2] Uppsala Univ, Sci Life Lab, SE-75124 Uppsala, Sweden
[3] Uppsala Univ, Dept Informat Technol, SE-75105 Uppsala, Sweden
[4] AstraZeneca R&D, Molndal, Sweden
[5] Swedish Toxicol Sci Res Ctr, SE-15136 Soedertaelje, Sweden
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Increasing size of datasets is challenging for machine learning, and Big Data frameworks, such as Apache Spark, have shown promise for facilitating model building on distributed resources. Conformal prediction is a mathematical framework that allows to assign valid confidence levels to object-specific predictions. This contrasts to current best-practices where the overall confidence level for predictions on unseen objects is estimated based on previous performance, assuming exchangeability. Here we report a Spark-based distributed implementation of conformal prediction, which introduces valid confidence estimation in predictive modeling for Big Data analytics. Experimental results on two large-scale datasets show the validity and the scalabilty of the method, which is freely available as open source.
引用
收藏
页码:61 / 67
页数:7
相关论文
共 50 条
  • [1] Coreset-based Conformal Prediction for Large-scale Learning
    Riquelme-Granada, Nery
    Khuong An Nguyen
    Luo, Zhiyuan
    [J]. CONFORMAL AND PROBABILISTIC PREDICTION AND APPLICATIONS, VOL 105, 2019, 105
  • [2] Large-Scale Machine Learning for Business Sector Prediction
    Angenent, Mitch N.
    Barata, Antonio Pereira
    Takes, Frank W.
    [J]. PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1143 - 1146
  • [3] Large-Scale Learning with AdaGrad on Spark
    Hadgu, Asmelash Teka
    Nigam, Aastha
    Diaz-Aviles, Ernesto
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2828 - 2830
  • [4] Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning
    Norinder, Ulf
    Spjuth, Ola
    Svensson, Fredrik
    [J]. JOURNAL OF CHEMINFORMATICS, 2021, 13 (01)
  • [5] Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning
    Ulf Norinder
    Ola Spjuth
    Fredrik Svensson
    [J]. Journal of Cheminformatics, 13
  • [6] A Machine-Learning Approach for Communication Prediction of Large-Scale Applications
    Papadopoulou, Nikela
    Goumas, Georgios
    Koziris, Nectarios
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 120 - 123
  • [7] Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark
    Chaudhury, Mousumi
    Karami, Amin
    Ghazanfar, Mustansar Ali
    [J]. ELECTRONICS, 2022, 11 (16)
  • [8] A Survey on Large-Scale Machine Learning
    Wang, Meng
    Fu, Weijie
    He, Xiangnan
    Hao, Shijie
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2574 - 2594
  • [9] Accelerating Relevance Vector Machine for Large-Scale Data on Spark
    Liu, Fang
    Zhong, Hao
    Li, Si-Han
    [J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12
  • [10] Large-scale comparison of machine learning algorithms for target prediction of natural products
    Liang, Lu
    Liu, Ye
    Kang, Bo
    Wang, Ru
    Sun, Meng-Yu
    Wu, Qi
    Meng, Xiang-Fei
    Lin, Jian-Ping
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)