A Generalized MapReduce Approach for Efficient mining of Large data Sets in the GRID

被引：0

作者：

Roehm, Matthias ^{[1
]}

Grabert, Matthias ^{[1
]}

Schweiggert, Franz ^{[1
]}

机构：

[1] Univ Ulm, Inst Appl Informat Proc, Ulm, Germany

来源：

PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, GRIDS, AND VIRTUALIZATION (CLOUD COMPUTING 2010) | 2010年

关键词：

Data mining; Grid; MapReduce;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The growing computerization in modern academic and industrial sectors is generating huge volumes of electronic data. Data mining is considered the technology to extract knowledge from these data. With an ever increasing amount of data and complexity of modern data mining applications, the demand for resources is rising tremendously. Grid and Cloud technologies promise to meet the requirements of heterogeneous, large-scale and distributed data mining applications. The DataMiningGrid system was developed to address some of these issues and provide high performance and scalability, sophisticated support for different types of users, flexible extensibility features, and support of relevant standards. While the DataMiningGrid, like most of the related grid systems, focused on compute-intensive applications, Google's MapReduce paradigm and Cloud-Computing brought up new solutions for efficient data analysis. Based on the DataMiningGrid, we developed the DataMiningGrid-Divide&Conquer system that combines these important technologies into a general-purpose data mining system suited for the different aspects of today's data analysis challenges. The system forms the core of the Fleet Data Acquisition Miner for analyzing the data generated by the Daimler fuel cell vehicle fleet.

引用

页码：14 / 19

页数：6

共 50 条

[1] Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce
Zhang, Yanfeng
Chen, Shimin
Yu, Ge
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3218 - 3230
[2] Mining large engineering data sets on the grid using AURA
Liang, B
Austin, J
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 430 - 436
[3] A novel data structure for efficient representation of large data sets in data mining
Pai, Radhika M.
Ananthanarayana, V. S.
[J]. 2006 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATIONS, VOLS 1 AND 2, 2007, : 533 - 538
[4] Efficient algorithms for mining outliers from large data sets
Ramaswamy, S
Rastogi, R
Shim, K
[J]. SIGMOD RECORD, 2000, 29 (02) : 427 - 438
[5] MapReduce algorithms for efficient generation of CPS models from large historical data sets
Windmann, Stefan
Niggemann, Oliver
[J]. PROCEEDINGS OF 2015 IEEE 20TH CONFERENCE ON EMERGING TECHNOLOGIES & FACTORY AUTOMATION (ETFA), 2015,
[6] Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce (Extended Abstract)
Zhang, Yanfeng
Chen, Shimin
Yu, Ge
[J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 67 - 68
[7] A Novel Approach for Mining Patterns from Large Uncertain Data using MapReduce Model
Rathan, B. Rini
Rani, K. Swarupa
[J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
[8] A MapReduce-Based Approach for Mining Embedded Patterns from Large Tree Data
Zhao, Wen
Wu, Xiaoying
[J]. WEB AND BIG DATA (APWEB-WAIM 2018), PT II, 2018, 10988 : 455 - 462
[9] An efficient data preprocessing approach for large scale medical data mining
Hu, Ya-Han
Lin, Wei-Chao
Tsai, Chih-Fong
Ke, Shih-Wen
Chen, Chih-Wen
[J]. TECHNOLOGY AND HEALTH CARE, 2015, 23 (02) : 153 - 160
[10] On generalized quantifiers, finite sets and data mining
Hájek, P
[J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2003, : 489 - 496

← 1 2 3 4 5 →