A Novel Nodesets-Based Frequent Itemset Mining Algorithm for Big Data using MapReduce

被引:0
|
作者
Sivaiah, Borra [1 ,2 ]
Rao, Ramisetty Rajeswara [3 ]
机构
[1] Jawaharlal Nehru Technol Univ, Dept Comp Sci & Engn, Kakinada, Andra Pradesh, India
[2] CMR Coll Engn Technol, Hyderabad, India
[3] Jawaharlal Nehru Technol Univ, Dept Comp Sci & Engn, Gurajada, Andra Pradesh, India
关键词
Big Data; Frequent Itemset Mining (FIM); MapReduce Programming Paradigm (MRPP); Fast and Scalable Frequent Item set Mining (FSFIM);
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Due to the rapid growth of data from different sources in organizations, the traditional tools and techniques that cannot handle such huge data are known as big data which is in a scalable fashion. Similarly, many existing frequent itemset mining algorithms have good performance but scalability problems as they cannot exploit parallel processing power available locally or in cloud infrastructure. Since big data and cloud ecosystem overcomes the barriers or limitations in computing resources, it is a natural choice to use distributed programming paradigms such as Map Reduce. In this paper, we propose a novel algorithm known as A Nodesets-based Fast and Scalable Frequent Itemset Mining (FSFIM) to extract frequent itemsets from Big Data. Here, Pre-Order Coding (POC) tree is used to represent data and improve speed in processing. Nodeset is the underlying data structure that is efficient in discovering frequent itemsets. FSFIM is found to be faster and more scalable in mining frequent itemsets. When compared with its predecessors such as Node-lists and N-lists, the Nodesets save half of the memory as they need only either pre- order or post-order coding. Cloudera's Distribution of Hadoop (CDH), a MapReduce framework, is used for empirical study. A prototype application is built to evaluate the performance of the FSFIM. Experimental results revealed that FSFIM outperforms existing algorithms such as Mahout PFP, Mlib PFP, and Big FIM. FSFIM is more scalable and found to be an ideal candidate for real-time applications that mine frequent itemsets from Big Data.
引用
收藏
页码:1051 / 1058
页数:8
相关论文
共 50 条
  • [1] MapReduce Based Frequent Itemset Mining Algorithm on Stream Data
    Chaudhary, Hemant
    Yadav, Deepak Kumar
    Bhatnagar, Rajat
    Chandrasekhar, Uddagiri
    [J]. 2015 GLOBAL CONFERENCE ON COMMUNICATION TECHNOLOGIES (GCCT), 2015, : 586 - 591
  • [2] MrFIM: A MapReduce Approach for Frequent Itemset Mining in Big Data
    Rahman, Abdul
    Manjaramkar, Arati
    [J]. 2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [3] New approach in Big Data Mining for frequent itemset using mapreduce in HDFS
    Nikam, Pallavi V.
    Deshpande, Deepa S.
    [J]. 2018 3RD INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [4] Sequence-Growth : A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework
    Liang, Yen-hui
    Wu, Shiow-yang
    [J]. 2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 393 - 400
  • [5] A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
    Fumarola, Fabio
    Malerba, Donato
    [J]. 2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 335 - 342
  • [6] Frequent Itemset Mining using Improved Apriori Algorithm with MapReduce
    Tribhuvan, Seema A.
    Gavai, Nitin R.
    Vasgi, Bharti P.
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
  • [7] Frequent Itemset Mining for Big Data in social media using ClustBigFIM algorithm
    Gole, Sheela
    Tidke, Bharat
    [J]. 2015 INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING (ICPC), 2015,
  • [8] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
    Zhang, Feng
    Liu, Min
    Gui, Feng
    Shen, Weiming
    Shami, Abdallah
    Ma, Yunlong
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (04): : 1493 - 1501
  • [9] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
    Feng Zhang
    Min Liu
    Feng Gui
    Weiming Shen
    Abdallah Shami
    Yunlong Ma
    [J]. Cluster Computing, 2015, 18 : 1493 - 1501
  • [10] Frequent Itemset Mining for Big Data
    Moens, Sandy
    Aksehirli, Emin
    Goethals, Bart
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,