Large-scale multi-label ensemble learning on Spark

被引：11

作者：

Gonzalez-Lopez, Jorge ^{[1
]}

Cano, Alberto ^{[1
]}

Ventura, Sebastian ^{[2
]}

机构：

[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA

[2] Univ Cordoba, Dept Comp Sci, Cordoba, Spain

来源：

2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS | 2017年

关键词：

Multi-label learning; Ensemble learning; Distributed computing; Apache Spark; Big data; MAPREDUCE; PERFORMANCE;

D O I：

10.1109/Trustcom/BigDataSE/ICESS.2017.328

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-label learning is a challenging problem which has received growing attention in the research community over the last years. Hence, there is a growing demand of effective and scalable multi-label learning methods for larger datasets both in terms of number of instances and numbers of output labels. The use of ensemble classifiers is a popular approach for improving multi-label model accuracy, especially for datasets with high-dimensional label spaces. However, the increasing computational complexity of the algorithms in such ever-growing high dimensional label spaces, requires new approaches to manage data effectively and efficiently in distributed computing environments. Spark is a framework based on MapReduce, a distributed programming model that offers a robust paradigm to handle large-scale datasets in a cluster of nodes. This paper focuses on multi-label ensembles and proposes a number of implementations through the use of parallel and distributed computing using Spark. Additionally, five different implementations are proposed and the impact on the performance of the ensemble is analyzed. The experimental study shows the benefits of using distributed implementations over the traditional single-node single-thread execution, in terms of performance over multiple metrics as well as significant speedup tested on 29 benchmark datasets.

引用

页码：893 / 900

页数：8

共 50 条

[1] Large-scale Multi-label Learning with Missing Labels
Yu, Hsiang-Fu
Jain, Prateek
Kar, Purushottam
Dhillon, Inderjit S.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[2] Does Tail Label Help for Large-Scale Multi-Label Learning
Wei, Tong
Li, Yu-Feng
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2847 - 2853
[3] Does Tail Label Help for Large-Scale Multi-Label Learning?
Wei, Tong
Li, Yu-Feng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (07) : 2315 - 2324
[4] Learning Compact Model for Large-Scale Multi-Label Data
Wei, Tong
Li, Yu-Feng
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5385 - 5392
[5] Distributed nearest neighbor classification for large-scale multi-label data on spark
Gonzalez-Lopez, Jorge
Ventura, Sebastian
Cano, Alberto
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 87 : 66 - 82
[6] True-negative Label Selection for Large-scale Multi-label Learning
Kanehira, Atsushi
Shin, Andrew
Harada, Tatsuya
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3673 - 3678
[7] A Divide-and-Conquer Approach for Large-scale Multi-label Learning
Zhang, Wenjie
Wang, Xiangfeng
Yan, Junchi
Zha, Hongyuan
2017 IEEE THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2017), 2017, : 398 - 401
[8] Multi-label Ensemble Learning
Shi, Chuan
Kong, Xiangnan
Yu, Philip S.
Wang, Bai
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2011, 6913 : 223 - 239
[9] Learning Label-Adaptive Representation for Large-Scale Multi-Label Text Classification
Peng, Cheng
Wang, Haobo
Wang, Jue
Shou, Lidan
Chen, Ke
Chen, Gang
Yao, Chang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2630 - 2640
[10] Predicting Unseen Labels Using Label Hierarchies in Large-Scale Multi-label Learning
Nam, Jinseok
Mencia, Eneldo Loza
Kim, Hyunwoo J.
Fuernkranz, Johannes
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2015, PT I, 2015, 9284 : 102 - 118

← 1 2 3 4 5 →