A Generalized MapReduce Approach for Efficient mining of Large data Sets in the GRID

被引:0
|
作者
Roehm, Matthias [1 ]
Grabert, Matthias [1 ]
Schweiggert, Franz [1 ]
机构
[1] Univ Ulm, Inst Appl Informat Proc, Ulm, Germany
关键词
Data mining; Grid; MapReduce;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The growing computerization in modern academic and industrial sectors is generating huge volumes of electronic data. Data mining is considered the technology to extract knowledge from these data. With an ever increasing amount of data and complexity of modern data mining applications, the demand for resources is rising tremendously. Grid and Cloud technologies promise to meet the requirements of heterogeneous, large-scale and distributed data mining applications. The DataMiningGrid system was developed to address some of these issues and provide high performance and scalability, sophisticated support for different types of users, flexible extensibility features, and support of relevant standards. While the DataMiningGrid, like most of the related grid systems, focused on compute-intensive applications, Google's MapReduce paradigm and Cloud-Computing brought up new solutions for efficient data analysis. Based on the DataMiningGrid, we developed the DataMiningGrid-Divide&Conquer system that combines these important technologies into a general-purpose data mining system suited for the different aspects of today's data analysis challenges. The system forms the core of the Fleet Data Acquisition Miner for analyzing the data generated by the Daimler fuel cell vehicle fleet.
引用
收藏
页码:14 / 19
页数:6
相关论文
共 50 条
  • [21] From visualisation to data mining with large data sets
    Adelmann, A
    Ryne, RD
    Shalf, JM
    Siegerist, C
    [J]. 2005 IEEE PARTICLE ACCELERATOR CONFERENCE (PAC), VOLS 1-4, 2005, : 542 - 544
  • [22] Mining for empty rectangles in large data sets
    Edmonds, J
    Gryz, J
    Liang, DM
    Miller, RJ
    [J]. DATABASE THEORY - ICDT 2001, PROCEEDINGS, 2001, 1973 : 174 - 188
  • [23] Mining combined causes in large data sets
    Ma, Saisai
    Li, Jiuyong
    Liu, Lin
    Thuc Duy Le
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 92 : 104 - 111
  • [24] Scalability issue in mining large data sets
    Mc Manus, A
    Kechadi, MT
    [J]. DATA MINING V: DATA MINING, TEXT MINING AND THEIR BUSINESS APPLICATIONS, 2004, 10 : 189 - 197
  • [25] Mining for empty spaces in large data sets
    Edmonds, J
    Gryz, J
    Liang, DM
    Miller, RJ
    [J]. THEORETICAL COMPUTER SCIENCE, 2003, 296 (03) : 435 - 452
  • [26] Mining frequent itemsets in large data warehouses: A novel approach proposed for sparse data sets
    Fakhrahmad, S. M.
    Jahromi, M. Zolghadri
    Sadreddini, M. H.
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2007, 2007, 4881 : 517 - +
  • [27] MrFIM: A MapReduce Approach for Frequent Itemset Mining in Big Data
    Rahman, Abdul
    Manjaramkar, Arati
    [J]. 2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [28] Efficient clustering of large data sets
    Ananthanarayana, VS
    Murty, MN
    Subramanian, DK
    [J]. PATTERN RECOGNITION, 2001, 34 (12) : 2561 - 2563
  • [29] Efficient K-Nearest Neighbor Graph Construction Using MapReduce for Large-Scale Data Sets
    Warashina, Tomohiro
    Aoyama, Kazuo
    Sawada, Hiroshi
    Hattori, Takashi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (12): : 3142 - 3154
  • [30] Generalized additive models for large data sets
    Wood, Simon N.
    Goude, Yannig
    Shaw, Simon
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2015, 64 (01) : 139 - 155