Spark based Parallel Frequent Pattern Rules for Social Media Data Analytics

被引：0

作者：

Chaturvedi, Shubhangi ^{[1
]}

Saritha, Sri Khetwat ^{[2
]}

Chaturvedi, Animesh ^{[3
]}

机构：

[1] PDPM Indian Inst Informat Technol Design & Mfg Ja, Comp Sci & Engn, Jabalpur, MP, India

[2] Maulana Azad Natl Inst Technol NIT Bhopal, Comp Sci & Engn, Bhopal, MP, India

[3] Indian Inst Informat Technol Dharwad IIIT Dharwad, Data Sci & Intelligent Syst, Dharwad, Karnataka, India

来源：

2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING WORKSHOPS, CCGRIDW | 2023年

关键词：

Apache Spark; Parallel Frequent Pattern Growth Social media; Data pre-processing; Association rule mining; ASSOCIATION RULES; DISCOVERY;

D O I：

10.1109/CCGridW59191.2023.00039

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The number of users on social media are increasing, thus the data produced is also increasing tremendously. Social media data mining and analysis can produce a lot of hidden information, which can be helpful in decision-making. Prediction of the co-occurring words with confidence can provide deep insights of social media. The paper presents an applied process to mine social media dataset to retrieve frequent patterns (or rules) in cost effective time. The retrieved patterns can be useful in making decisions related to social media. The experiment is performed on three social media datasets and various rules are analyzed by varying the values of threshold (minimum support and minimum confidence). Experiments are also performed for both Frequent Pattern (FP) Growth and Parallel FP (PFP) Growth using the same datasets. The parallel computation is achieved with the help of a scalable Apache Spark environment. Execution time for both FP-Growth and PFP-Growth on the same datasets is also described. While performing experiments it is found that FP-Growth of SPMF requires preprocessing to convert item-sets into transactional databases. The pre-processing time is required only once, as a result the time required to generate rules is less. Whereas, the PFP-Growth does not require preprocessing on the dataset to generate rules. This saves time to directly generate the association rules using PFP-Growth.

引用

页码：168 / 175

页数：8

共 50 条

[1] Research on Parallel Frequent Pattern Discovery Based on Ontology and Rules
Gu, Yuhan
Sun, Ming
[J]. 2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 611 - 615
[2] RESEARCH ON PARALLEL FREQUENT PATTERN MINING BASED ON ONTOLOGY AND RULES
Yi, Chenxi
Sun, Ming
[J]. 4TH INTERNATIONAL CONFERENCE ON SMART AND SUSTAINABLE CITY (ICSSC 2017), 2017, : 33 - 37
[3] Social Media Analytics Based on Big Data
Shaikh, Farzana
Rangrez, Firdaus
Khan, Afsha
Shaikh, Uzma
[J]. PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2), 2017,
[4] A parallel library for social media analytics
Belcastro, Loris
Marozzo, Fabrizio
Talia, Domenico
Trunfio, Paolo
[J]. Proceedings - 2017 International Conference on High Performance Computing and Simulation, HPCS 2017, 2017, : 683 - 690
[5] A Parallel Library for Social Media Analytics
Belcastro, Loris
Marozzo, Fabrizio
Talia, Domenico
Trunfio, Paolo
[J]. 2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 683 - 690
[6] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
Zhang, Feng
Liu, Min
Gui, Feng
Shen, Weiming
Shami, Abdallah
Ma, Yunlong
[J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (04): : 1493 - 1501
[7] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
Feng Zhang
Min Liu
Feng Gui
Weiming Shen
Abdallah Shami
Yunlong Ma
[J]. Cluster Computing, 2015, 18 : 1493 - 1501
[8] Effective Parallel Processing Social Media Analytics Framework
Singh, Ravindra Kumar
Verma, Harsh Kumar
[J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) : 2860 - 2870
[9] An Innovative Framework for Supporting Cognitive-Based Big Data Analytics for Frequent Pattern Mining
Deng, Deyu
Leung, Carson K.
Wodi, Bryan H.
Yu, Jialiang
Zhang, Hao
Cuzzocrea, Alfredo
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING (ICCC), 2018, : 49 - 56
[10] Parallel algorithm for mining frequent item sets based on Spark
Mao, Yimin
Wu, Bin
Xu, Chundong
Zhang, Maosheng
[J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2023, 29 (04): : 1267 - 1283

← 1 2 3 4 5 →