Spark based Parallel Frequent Pattern Rules for Social Media Data Analytics

被引:0
|
作者
Chaturvedi, Shubhangi [1 ]
Saritha, Sri Khetwat [2 ]
Chaturvedi, Animesh [3 ]
机构
[1] PDPM Indian Inst Informat Technol Design & Mfg Ja, Comp Sci & Engn, Jabalpur, MP, India
[2] Maulana Azad Natl Inst Technol NIT Bhopal, Comp Sci & Engn, Bhopal, MP, India
[3] Indian Inst Informat Technol Dharwad IIIT Dharwad, Data Sci & Intelligent Syst, Dharwad, Karnataka, India
关键词
Apache Spark; Parallel Frequent Pattern Growth Social media; Data pre-processing; Association rule mining; ASSOCIATION RULES; DISCOVERY;
D O I
10.1109/CCGridW59191.2023.00039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of users on social media are increasing, thus the data produced is also increasing tremendously. Social media data mining and analysis can produce a lot of hidden information, which can be helpful in decision-making. Prediction of the co-occurring words with confidence can provide deep insights of social media. The paper presents an applied process to mine social media dataset to retrieve frequent patterns (or rules) in cost effective time. The retrieved patterns can be useful in making decisions related to social media. The experiment is performed on three social media datasets and various rules are analyzed by varying the values of threshold (minimum support and minimum confidence). Experiments are also performed for both Frequent Pattern (FP) Growth and Parallel FP (PFP) Growth using the same datasets. The parallel computation is achieved with the help of a scalable Apache Spark environment. Execution time for both FP-Growth and PFP-Growth on the same datasets is also described. While performing experiments it is found that FP-Growth of SPMF requires preprocessing to convert item-sets into transactional databases. The pre-processing time is required only once, as a result the time required to generate rules is less. Whereas, the PFP-Growth does not require preprocessing on the dataset to generate rules. This saves time to directly generate the association rules using PFP-Growth.
引用
收藏
页码:168 / 175
页数:8
相关论文
共 50 条
  • [31] Overview of Influence Maximization in Social Media Data Analytics
    Li, Jianxin
    [J]. WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 1201 - 1201
  • [32] A parallel approach for high utility-based frequent pattern mining in a big data environment
    Krishna Kumar Mohbey
    Sunil Kumar
    [J]. Iran Journal of Computer Science, 2021, 4 (3) : 195 - 200
  • [33] An ontology based text analytics on social media
    GNDU, Regional Campus, Jalandhar, India
    [J]. Int. J. Database Theory Appl., 5 (233-240):
  • [34] Incorporating Big Data Tools for Social Media Analytics in a Business Analytics Course
    Zadeh, Amir H.
    Zolbanin, Hamed M.
    Sharda, Ramesh
    [J]. Journal of Information Systems Education, 2021, 32 (03) : 176 - 198
  • [35] ASSOCIATION-RULES-BASED DATA IMPUTATION WITH SPARK
    Qu, Zhaowei
    Yan, Jianru
    Yin, Sixing
    [J]. PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 145 - 149
  • [36] A RST-based Stateful Data Analytics Within Spark
    Ge, Jike
    Chen, Zuqin
    Liu, Can
    Peng, Jun
    He, Wenbo
    Zhu, Nan
    [J]. 2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 394 - 399
  • [37] Balanced Parallel Frequent Pattern Mining Over Massive Data Stream
    Fu, Xi
    Shi, Lei
    Li, Jing
    [J]. 2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 50 - 59
  • [38] Issues of social data analytics with a new method for sentiment analysis of social media data
    Wang, Zhaoxia
    Tong, Victor Joo Chuan
    Chan, David
    [J]. 2014 IEEE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2014, : 899 - 904
  • [39] Big Data vs. Data Mining for Social Media Analytics
    Danubianu, M.
    Barila, A.
    [J]. SMART 2014 - SOCIAL MEDIA IN ACADEMIA: RESEARCH AND TEACHING, 2015, : 261 - 269
  • [40] A Novel Influence Maximization Algorithm for a Competitive Environment Based on Social Media Data Analytics
    Jie Tong
    Leilei Shi
    Lu Liu
    John Panneerselvam
    Zixuan Han
    [J]. Big Data Mining and Analytics, 2022, 5 (02) : 130 - 139