Large-Scale Learning with AdaGrad on Spark

被引:0
|
作者
Hadgu, Asmelash Teka [1 ]
Nigam, Aastha [2 ]
Diaz-Aviles, Ernesto [3 ]
机构
[1] L3S Res Ctr, Hannover, Germany
[2] Univ Notre Dame, Indiana, PA USA
[3] IBM Res, Dublin, Ireland
关键词
Distributed machine learning; Adaptive gradient; Spark;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stochastic Gradient Descent (SGD) is a simple yet very efficient online learning algorithm for optimizing convex (and often non-convex) functions and one of the most popular stochastic optimization methods in machine learning today. One drawback of SGD is that it is sensitive to the learning rate hyper-parameter. The Adaptive Sub-gradient Descent, AdaGrad, dynamically incorporates knowledge of the geometry of the data observed in earlier iterations to calculate a different learning rate for every feature. In this work, we implement a distributed version of AdaGrad for large-scale machine learning tasks using Apache Spark. Apache Spark is a fast cluster computing engine that provides similar scalability and fault tolerance properties to MapReduce, but in contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's multi-stage in-memory primitives allow user programs to load data into a cluster's memory and query it repeatedly, which makes it ideal for building scalable machine learning applications. We empirically evaluate our implementation on large-scale real-world problems in the machine learning canonical tasks of classification and regression. Comparing our implementation of AdaGrad with the SGD scheduler currently available in Spark's Machine Learning Library (MLlib), we experimentally show that AdaGrad saves time by avoiding manually setting a learning-rate hyperparameter, converges fast and can even achieve better generalization errors.
引用
收藏
页码:2828 / 2830
页数:3
相关论文
共 50 条
  • [1] Large-scale multi-label ensemble learning on Spark
    Gonzalez-Lopez, Jorge
    Cano, Alberto
    Ventura, Sebastian
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 893 - 900
  • [2] Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence
    Capuccini, Marco
    Carlsson, Lars
    Norinder, Ulf
    Spjuth, Ola
    [J]. 2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 61 - 67
  • [3] Large-scale e-learning recommender system based on Spark and Hadoop
    Dandouh, Karim
    Dakkak, Ahmed
    Oughdir, Lahcen
    Ibriz, Abdelali
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)
  • [4] Large-scale e-learning recommender system based on Spark and Hadoop
    Karim Dahdouh
    Ahmed Dakkak
    Lahcen Oughdir
    Abdelali Ibriz
    [J]. Journal of Big Data, 6
  • [5] Large-Scale Network Embedding in Apache Spark
    Lin, Wenqing
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3271 - 3279
  • [6] Large-Scale Data Pollution with Apache Spark
    Hildebrandt, Kai
    Panse, Fabian
    Wilcke, Niklas
    Ritter, Norbert
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (02) : 396 - 411
  • [7] Accelerating Large-Scale Genomic Analysis with Spark
    Li, Xueqi
    Tan, Guangming
    Zhang, Chunming
    Li, Xu
    Zhang, Zhonghai
    Sun, Ninghui
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 747 - 751
  • [8] Large-scale geographically weighted regression on Spark
    Hung Tien Tran
    Hiep Tuan Nguyen
    Viet-Trung Tran
    [J]. 2016 EIGHTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2016, : 127 - 132
  • [9] Processing large-scale data with Apache Spark
    Ko, Seyoon
    Won, Joong-Ho
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (06) : 1077 - 1094
  • [10] Large-Scale Human Action Recognition with Spark
    Wang, Hanli
    Zheng, Xiaobin
    Xiao, Bo
    [J]. 2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,