Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

被引:0
|
作者
Khan, Ayaz H. [1 ]
Qamar, Ali Mustafa [2 ]
Yusuf, Aneeq [1 ]
Khan, Rehanullah [2 ]
机构
[1] Karachi Inst Econ & Technol, Coll Comp & Informat Sci, Karachi, Pakistan
[2] Qassim Univ, Coll Comp, Mulaidah, Saudi Arabia
关键词
Big data; deep learning; deep auto-encoders; Restricted Boltzmann Machines (RBM);
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The goal of big data analytics is to analyze datasets with a higher amount of volume, velocity, and variety for large-scale business intelligence problems. These workloads are normally processed with the distribution on massively parallel analytical systems. Deep learning is part of a broader family of machine learning methods based on learning representations of data. Deep learning plays a significant role in the information analysis by adding value to the massive amount of unsupervised data. A core domain of research is related to the development of deep learning algorithms for auto-extraction of complex data formats at a higher level of abstraction using the massive volumes of data. In this paper, we present the latest research trends in the development of parallel algorithms, optimization techniques, tools and libraries related to big data analytics and deep learning on various parallel architectures. The basic building blocks for deep learning such as Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) are identified and analyzed for parallelization of deep learning models. We proposed a parallel software API based on PyTorch, Hadoop Distributed File System (HDFS), Apache Hadoop MapReduce and MapReduce Job (MRJob) for developing large-scale deep learning models. We obtained about 5-30% reduction in the execution time of the deep auto-encoder model even on a single node Hadoop cluster. Furthermore, the complexity of code development is significantly reduced to create multi-layer deep learning models.
引用
收藏
页码:557 / 566
页数:10
相关论文
共 50 条
  • [1] Software abstractions for large-scale deep learning models in big data analytics
    Khan, Ayaz H.
    Qamar, Ali Mustafa
    Yusuf, Aneeq
    Khan, Rehanullah
    [J]. International Journal of Advanced Computer Science and Applications, 2019, 10 (04): : 557 - 566
  • [2] Big Data Analytics on Large-Scale Socio-technical Software Engineering Archives
    Bayati, Shahabedin
    Parsons, David
    Susnjak, Teo
    Heidary, Marzieh
    [J]. 2015 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2015, : 65 - 69
  • [3] Big Data for Enhanced Learning Analytics: A Case for Large-Scale Comparative Assessments
    Korfiatis, Nikolaos
    [J]. METADATA AND SEMANTICS RESEARCH, MTSR 2013, 2013, 390 : 225 - 233
  • [4] Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
    Veiga, Jorge
    Exposito, Roberto R.
    Pardo, Xoan C.
    Taboada, Guillermo L.
    Tourino, Juan
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 424 - 431
  • [5] Deep Learning for Big Data Analytics
    Bathla, Gourav
    Aggarwal, Himanshu
    Rani, Rinkle
    [J]. ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 391 - 399
  • [6] Distributed optimization over large-scale systems for big data analytics
    Reza Shahbazian
    [J]. 4OR, 2021, 19 : 309 - 310
  • [7] Distributed optimization over large-scale systems for big data analytics
    Shahbazian, Reza
    [J]. 4OR-A QUARTERLY JOURNAL OF OPERATIONS RESEARCH, 2021, 19 (02): : 309 - 310
  • [8] BANKSAFE: Visual analytics for big data in large-scale computer networks
    Fischer, Fabian
    Fuchs, Johannes
    Mansmann, Florian
    Keim, Daniel A.
    [J]. INFORMATION VISUALIZATION, 2015, 14 (01) : 51 - 61
  • [9] Big Data Analytics for Large-scale Wireless Networks: Challenges and Opportunities
    Dai, Hong-Ning
    Wong, Raymond Chi-Wing
    Wang, Hao
    Zheng, Zibin
    Vasilakos, Athanasios V.
    [J]. ACM COMPUTING SURVEYS, 2019, 52 (05)
  • [10] Big Data, Big Results: Knowledge Discovery in Output from Large-Scale Analytics
    McCormick, Tyler H.
    Ferrell, Rebecca
    Karr, Alan F.
    Ryan, Patrick B.
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2014, 7 (05) : 404 - 412