Bring Your Own Learner! A Cloud-Based, Data-Parallel Commons for Machine Learning

被引:11
|
作者
Arnaldo, Ignacio [1 ]
Veeramachaneni, Kalyan [1 ]
Song, Andrew [1 ]
O'Reilly, Una-May [1 ]
机构
[1] MIT, Anyscale Learning All ALFA Grp, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
LARGE-SCALE DATA; ENSEMBLE; BENCHMARKING; ALGORITHMS;
D O I
10.1109/MCI.2014.2369892
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce FCUBE, a cloud-based framework that enables machine learning researchers to contribute their learners to its community-shared repository. FCUBE exploits data parallelism in lieu of algorithmic parallelization to allow its users to efficiently tackle large data problems automatically. It passes random subsets of data generated via resampling to multiple learners that it executes simultaneously and then it combines their model predictions with a simple fusion technique. It is an example of what we have named a Bring Your Own Learner model. It allows multiple machine learning researchers to contribute algorithms in a plug-and-play style. We contend that the Bring Your Own Learner model signals a design shift in cloud-based machine learning infrastructure because it is capable of executing anyone's supervised machine learning algorithm. We demonstrate FCUBE executing five different learners contributed by three different machine learning groups on a 100 node deployment on Amazon EC2. They collectively solve a publicly available classification problem trained with 11 million exemplars from the Higgs dataset.
引用
收藏
页码:20 / 32
页数:13
相关论文
共 50 条
  • [1] Cloud-Based Parallel Machine Learning for Tool Wear Prediction
    Wu, Dazhong
    Jennings, Connor
    Terpenny, Janis
    Kumara, Soundar
    Gao, Robert X.
    [J]. JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING-TRANSACTIONS OF THE ASME, 2018, 140 (04):
  • [2] A Proactive Data-Parallel Framework for Machine Learning
    Zhao, Guoyi
    Zhou, Tian
    Gao, Lixin
    [J]. 8TH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, BDCAT 2021, 2021, : 69 - 79
  • [3] Perspectives on Big Data, Cloud-Based Data Analysis and Machine Learning Systems
    Marozzo, Fabrizio
    Talia, Domenico
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (02)
  • [4] Bring your own device into problem based learning tutorials
    Falconer, John
    Gray, Sarah
    Gaul, Kathy
    [J]. MEDICAL TEACHER, 2014, 36 (12) : 1086 - 1087
  • [5] Cloud-based Machine Learning Tools for Enhanced Big Data Applications
    Cuzzocrea, Alfredo
    Mumolo, Enzo
    Corona, Pietro
    [J]. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 908 - 914
  • [6] The Rise of Bring Your Own Encryption (BYOE) for Secure Data Storage in Cloud Databases
    Syed, Sadia
    Ussenaiah, M.
    [J]. 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), 2015, : 1463 - 1468
  • [7] NCI Cancer Research Data Commons: Cloud-Based Analytic Resources
    Pot, David
    Worman, Zelia
    Baumann, Alexander
    Pathak, Shirish
    Beck, Rowan
    Beck, Erin
    Thayer, Katherine
    Davidsen, Tanja M.
    Kim, Erika
    Davis-Dusenbery, Brandi
    Otridge, John
    Pihl, Todd
    Barnholtz-Sloan, Jill S.
    Kerlavage, Anthony R.
    [J]. CANCER RESEARCH, 2024, 84 (09) : 1396 - 1403
  • [8] A Cloud-Based Framework for Machine Learning Workloads and Applications
    Lopez Garcia, Alvaro
    Marco De Lucas, Jesus
    Antonacci, Marica
    Zu Castell, Wolfgang
    David, Mario
    Hardt, Marcus
    Lloret Iglesias, Lara
    Molto, German
    Plociennik, Marcin
    Viet Tran
    Alic, Andy S.
    Caballer, Miguel
    Campos Plasencia, Isabel
    Costantini, Alessandro
    Dlugolinsky, Stefan
    Duma, Doina Cristina
    Donvito, Giacinto
    Gomes, Jorge
    Heredia Cacha, Ignacio
    Ito, Keiichi
    Kozlov, Valentin Y.
    Giang Nguyen
    Orviz Fernandez, Pablo
    SUstr, Zdenek
    Wolniewicz, Pawel
    [J]. IEEE ACCESS, 2020, 8 : 18681 - 18692
  • [9] A Cloud-based Architecture for Condition Monitoring based on Machine Learning
    Arevalo, Fernando
    Diprasetya, Mochammad Rizky
    Schwung, Andreas
    [J]. 2018 IEEE 16TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2018, : 163 - 168
  • [10] A Cloud-based Framework for Implementing Portable Machine Learning Pipelines for Neural Data Analysis
    Ellis, Charles A.
    Gu, Ping
    Sendi, Mohammad S. E.
    Huddleston, Daniel
    Sharma, Ashish
    Mahmoudi, Babak
    [J]. 2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 4466 - 4469