Spark based Parallel Frequent Pattern Rules for Social Media Data Analytics

被引:0
|
作者
Chaturvedi, Shubhangi [1 ]
Saritha, Sri Khetwat [2 ]
Chaturvedi, Animesh [3 ]
机构
[1] PDPM Indian Inst Informat Technol Design & Mfg Ja, Comp Sci & Engn, Jabalpur, MP, India
[2] Maulana Azad Natl Inst Technol NIT Bhopal, Comp Sci & Engn, Bhopal, MP, India
[3] Indian Inst Informat Technol Dharwad IIIT Dharwad, Data Sci & Intelligent Syst, Dharwad, Karnataka, India
关键词
Apache Spark; Parallel Frequent Pattern Growth Social media; Data pre-processing; Association rule mining; ASSOCIATION RULES; DISCOVERY;
D O I
10.1109/CCGridW59191.2023.00039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of users on social media are increasing, thus the data produced is also increasing tremendously. Social media data mining and analysis can produce a lot of hidden information, which can be helpful in decision-making. Prediction of the co-occurring words with confidence can provide deep insights of social media. The paper presents an applied process to mine social media dataset to retrieve frequent patterns (or rules) in cost effective time. The retrieved patterns can be useful in making decisions related to social media. The experiment is performed on three social media datasets and various rules are analyzed by varying the values of threshold (minimum support and minimum confidence). Experiments are also performed for both Frequent Pattern (FP) Growth and Parallel FP (PFP) Growth using the same datasets. The parallel computation is achieved with the help of a scalable Apache Spark environment. Execution time for both FP-Growth and PFP-Growth on the same datasets is also described. While performing experiments it is found that FP-Growth of SPMF requires preprocessing to convert item-sets into transactional databases. The pre-processing time is required only once, as a result the time required to generate rules is less. Whereas, the PFP-Growth does not require preprocessing on the dataset to generate rules. This saves time to directly generate the association rules using PFP-Growth.
引用
收藏
页码:168 / 175
页数:8
相关论文
共 50 条
  • [1] Research on Parallel Frequent Pattern Discovery Based on Ontology and Rules
    Gu, Yuhan
    Sun, Ming
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 611 - 615
  • [2] RESEARCH ON PARALLEL FREQUENT PATTERN MINING BASED ON ONTOLOGY AND RULES
    Yi, Chenxi
    Sun, Ming
    [J]. 4TH INTERNATIONAL CONFERENCE ON SMART AND SUSTAINABLE CITY (ICSSC 2017), 2017, : 33 - 37
  • [3] Social Media Analytics Based on Big Data
    Shaikh, Farzana
    Rangrez, Firdaus
    Khan, Afsha
    Shaikh, Uzma
    [J]. PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2), 2017,
  • [4] A parallel library for social media analytics
    Belcastro, Loris
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    [J]. Proceedings - 2017 International Conference on High Performance Computing and Simulation, HPCS 2017, 2017, : 683 - 690
  • [5] A Parallel Library for Social Media Analytics
    Belcastro, Loris
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    [J]. 2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 683 - 690
  • [6] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
    Zhang, Feng
    Liu, Min
    Gui, Feng
    Shen, Weiming
    Shami, Abdallah
    Ma, Yunlong
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (04): : 1493 - 1501
  • [7] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
    Feng Zhang
    Min Liu
    Feng Gui
    Weiming Shen
    Abdallah Shami
    Yunlong Ma
    [J]. Cluster Computing, 2015, 18 : 1493 - 1501
  • [8] Effective Parallel Processing Social Media Analytics Framework
    Singh, Ravindra Kumar
    Verma, Harsh Kumar
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) : 2860 - 2870
  • [9] An Innovative Framework for Supporting Cognitive-Based Big Data Analytics for Frequent Pattern Mining
    Deng, Deyu
    Leung, Carson K.
    Wodi, Bryan H.
    Yu, Jialiang
    Zhang, Hao
    Cuzzocrea, Alfredo
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING (ICCC), 2018, : 49 - 56
  • [10] Parallel algorithm for mining frequent item sets based on Spark
    Mao, Yimin
    Wu, Bin
    Xu, Chundong
    Zhang, Maosheng
    [J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2023, 29 (04): : 1267 - 1283