Evaluation of Machine Learning Frameworks on Bank Marketing and Higgs Datasets

被引:4
|
作者
Shashidhara, Bhuvan M. [1 ]
Jain, Siddharth [1 ]
Rao, Vinay D. [1 ]
Patil, Nagamma [1 ]
Raghavendra, G. S. [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Informat Technol, Surathkal, India
关键词
Machine Learning Algorithms; Big Data; Parallel Execution; Distributed Computing; WEKA; Scikit-Learn; Apache Spark;
D O I
10.1109/ICACCE.2015.31
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big data is an emerging field with different datasets of various sizes are being analyzed for potential applications. In parallel, many frameworks are being introduced where these datasets can be fed into machine learning algorithms. Though some experiments have been done to compare different machine learning algorithms on different data, these experiments have not been tested out on different platforms. Our research aims to compare two selected machine learning algorithms on data sets of different sizes deployed on different platforms like Weka, Scikit-Learn and Apache Spark. They are evaluated based on Training time, Accuracy and Root mean squared error. This comparison helps us to decide what platform is best suited to work while applying computationally expensive selected machine learning algorithms on a particular size of data. Experiments suggested that Scikit-Learn would be optimal on data which can fit into memory. While working with huge, data Apache Spark would be optimal as it performs parallel computations by distributing the data over a cluster. Hence this study concludes that spark platform which has growing support for parallel implementation of machine learning algorithms could be optimal to analyze big data.
引用
收藏
页码:551 / 555
页数:5
相关论文
共 50 条
  • [41] A Minimal Learning Machine for Datasets with Missing Values
    Paiva Mesquita, Diego P.
    Gomes, Joao Paulo P.
    Souza, Amauri H., Jr.
    NEURAL INFORMATION PROCESSING, PT I, 2015, 9489 : 565 - 572
  • [42] The Role of Machine Learning in Digital Marketing
    Ullal, Mithun S.
    Hawaldar, Iqbal Thonse
    Soni, Rashmi
    Nadeem, Mohammed
    SAGE OPEN, 2021, 11 (04):
  • [43] Application of Big Data Analytics and Machine Learning to Large-Scale Synchrophasor Datasets: Evaluation of Dataset 'Machine Learning-Readiness'
    Hart, Philip
    He, Lijun
    Wang, Tianyi
    Kumar, Vijay S.
    Aggour, Kareem
    Subramanian, Arun
    Yan, Weizhong
    IEEE OPEN ACCESS JOURNAL OF POWER AND ENERGY, 2022, 9 : 386 - 397
  • [44] Evaluation of unsupervised machine learning frameworks to select representative geological realizations for uncertainty quantification
    Mahjour, Seyed Kourosh
    Mendes da Silva, Luis Otavio
    Angelotti Meira, Luis Augusto
    Coelho, Guilherme Palermo
    Souza dos Santos, Antonio Alberto de
    Schiozer, Denis Jose
    JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2022, 209
  • [45] Performance Evaluation of Pipe Break Machine Learning Models Using Datasets from Multiple Utilities
    Chen, Thomas Ying-Jeh
    Vladeanu, Greta
    Yazdekhasti, Sepideh
    Daly, Craig Michael
    JOURNAL OF INFRASTRUCTURE SYSTEMS, 2022, 28 (02)
  • [46] Empirical Evaluation of Noise Influence on Supervised Machine Learning Algorithms Using Intrusion Detection Datasets
    Al-Gethami, Khalid M.
    Al-Akhras, Mousa T.
    Alawairdhi, Mohammed
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [47] Machine Learning Approach and Model Performance Evaluation for Tele-Marketing Success Classification
    Kocoglu, Fatma Onay
    Esnaf, Sakir
    INTERNATIONAL JOURNAL OF BUSINESS ANALYTICS, 2022, 9 (05)
  • [48] Python Fuzzing for Trustworthy Machine Learning Frameworks
    I. Yegorov
    E. Kobrin
    D. Parygina
    A. Vishnyakov
    A. Fedotov
    Journal of Mathematical Sciences, 2024, 285 (2) : 180 - 188
  • [49] Relay: A New IR for Machine Learning Frameworks
    Roesch, Jared
    Lyubomirsky, Steven
    Weber, Logan
    Pollock, Josh
    Kirisame, Marisa
    Chen, Tianqi
    Tatlock, Zachary
    MAPL'18: PROCEEDINGS OF THE 2ND ACM SIGPLAN INTERNATIONAL WORKSHOP ON MACHINE LEARNING AND PROGRAMMING LANGUAGES, 2018, : 58 - 68
  • [50] Measuring the Quality of Machine Learning and Optimization Frameworks
    Villalobos, Ignacio
    Ferrer, Javier
    Alba, Enrique
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2018, 2018, 11160 : 128 - 139