Large-scale multi-label ensemble learning on Spark

被引:11
|
作者
Gonzalez-Lopez, Jorge [1 ]
Cano, Alberto [1 ]
Ventura, Sebastian [2 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
[2] Univ Cordoba, Dept Comp Sci, Cordoba, Spain
关键词
Multi-label learning; Ensemble learning; Distributed computing; Apache Spark; Big data; MAPREDUCE; PERFORMANCE;
D O I
10.1109/Trustcom/BigDataSE/ICESS.2017.328
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-label learning is a challenging problem which has received growing attention in the research community over the last years. Hence, there is a growing demand of effective and scalable multi-label learning methods for larger datasets both in terms of number of instances and numbers of output labels. The use of ensemble classifiers is a popular approach for improving multi-label model accuracy, especially for datasets with high-dimensional label spaces. However, the increasing computational complexity of the algorithms in such ever-growing high dimensional label spaces, requires new approaches to manage data effectively and efficiently in distributed computing environments. Spark is a framework based on MapReduce, a distributed programming model that offers a robust paradigm to handle large-scale datasets in a cluster of nodes. This paper focuses on multi-label ensembles and proposes a number of implementations through the use of parallel and distributed computing using Spark. Additionally, five different implementations are proposed and the impact on the performance of the ensemble is analyzed. The experimental study shows the benefits of using distributed implementations over the traditional single-node single-thread execution, in terms of performance over multiple metrics as well as significant speedup tested on 29 benchmark datasets.
引用
收藏
页码:893 / 900
页数:8
相关论文
共 50 条
  • [21] Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models
    Yannis Papanikolaou
    Grigorios Tsoumakas
    Manos Laliotis
    Nikos Markantonatos
    Ioannis Vlahavas
    Journal of Biomedical Semantics, 8
  • [22] Accurate and Efficient Large-Scale Multi-Label Learning With Reduced Feature Broad Learning System Using Label Correlation
    Huang, Jintao
    Vong, Chi-Man
    Chen, C. L. Philip
    Zhou, Yimin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10240 - 10253
  • [23] Large scale multi-label learning using Gaussian processes
    Aristeidis Panos
    Petros Dellaportas
    Michalis K. Titsias
    Machine Learning, 2021, 110 : 965 - 987
  • [24] Large scale multi-label learning using Gaussian processes
    Panos, Aristeidis
    Dellaportas, Petros
    Titsias, Michalis K.
    MACHINE LEARNING, 2021, 110 (05) : 965 - 987
  • [25] Dynamic ensemble learning for multi-label classification
    Zhu, Xiaoyan
    Li, Jiaxuan
    Ren, Jingtao
    Wang, Jiayin
    Wang, Guangtao
    INFORMATION SCIENCES, 2023, 623 : 94 - 111
  • [26] A multi-core computing approach for large-scale multi-label classification
    Rodriguez, Juan Manuel
    Godoy, Daniela
    Mateos, Cristian
    Zunino, Alejandro
    INTELLIGENT DATA ANALYSIS, 2017, 21 (02) : 329 - 352
  • [27] Meta-LMTC: Meta-Learning for Large-Scale Multi-Label Text Classification
    Wang, Ran
    Su, Xi'ao
    Long, Siyu
    Dai, Xinyu
    Huang, Shujian
    Chen, Jiajun
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8633 - 8646
  • [28] A Deep Learning-Based Cluster Analysis Method for Large-Scale Multi-Label Images
    Xu, Yanping
    TRAITEMENT DU SIGNAL, 2022, 39 (03) : 931 - 937
  • [29] Large-scale multi-label classification using unknown streaming images
    Zhang, Yu
    Wang, Yin
    Liu, Xu-Ying
    Mi, Siya
    Zhang, Min-Ling
    PATTERN RECOGNITION, 2020, 99
  • [30] Deep Determinantal Point Process for Large-Scale Multi-Label Classification
    Xie, Pengtao
    Salakhutdinov, Ruslan
    Mou, Luntian
    Xing, Eric P.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 473 - 482