A MapReduce-based approach to social network big data mining

被引:1
|
作者
Qi, Fuli [1 ]
机构
[1] Shanghai Zhongqiao Vocat & Tech Univ, Sch Informat Engn, Shanghai 201514, Peoples R China
关键词
Social network; big data; MapReduce; parallel K-means clustering algorithm; Weibo topic; ALGORITHM;
D O I
10.3233/JCM-226903
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The rapid development of social networks has facilitated the convenience of users to receive information. As a network communication platform for people's daily use, microblog has countless information data. In view of the low efficiency and poor clustering effect of K-means algorithm, a parallel K-means clustering algorithm based on MapReduce model is studied; In order to alleviate the difficulty in calculating the similarity of microblog topic text, the space vector model and semantic similarity are used to calculate the similarity between texts to improve the quality of microblog text classification. The data expansion rate of corresponding nodes under different data sets shows that the average expansion rate of the parallel K-means algorithm reaches 0.89, and the running rate is the highest. The results show that the parallel K-means algorithm has good clustering stability and the highest clustering quality, reaching 1.24; The clustering time of the algorithm is the shortest, the average clustering time is 1.27 minutes, and the clustering effect and efficiency of the algorithm are the best. In the quality analysis of Weibo topic recommendation, the accuracy of P-K-means recommendation is 95.64%, user satisfaction is 98.64%, and the recommendation effect is also the best. It shows that the research on the parallel K-means clustering algorithm based on MapReduce has the best performance in microblogging topic mining and recommendation, which can efficiently recommend topics of interest to users and enhance users' microblogging experience.
引用
收藏
页码:2535 / 2547
页数:13
相关论文
共 50 条
  • [1] A MapReduce-based Approach to Scale Big Semantic Data Compression with HDT
    Gimenez, J. M.
    Fernandez, J. D.
    Martinez, M. A.
    [J]. IEEE LATIN AMERICA TRANSACTIONS, 2017, 15 (07) : 1270 - 1277
  • [2] Knowledge process of health big data using MapReduce-based associative mining
    Choi, So-Young
    Chung, Kyungyong
    [J]. PERSONAL AND UBIQUITOUS COMPUTING, 2020, 24 (05) : 571 - 581
  • [3] A MapReduce-Based ELM for Regression in Big Data
    Wu, B.
    Yan, T. H.
    Xu, X. S.
    He, B.
    Li, W. H.
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
  • [4] Knowledge process of health big data using MapReduce-based associative mining
    So-Young Choi
    Kyungyong Chung
    [J]. Personal and Ubiquitous Computing, 2020, 24 : 571 - 581
  • [5] Atrak: a MapReduce-based data warehouse for big data
    Barkhordari, Mohammadhossein
    Niamanesh, Mahdi
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4596 - 4610
  • [6] Atrak: a MapReduce-based data warehouse for big data
    Mohammadhossein Barkhordari
    Mahdi Niamanesh
    [J]. The Journal of Supercomputing, 2017, 73 : 4596 - 4610
  • [7] A MapReduce-based Fuzzy Associative Classifier for Big Data
    Ducange, Pietro
    Marcelloni, Francesco
    Segatori, Armando
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [8] MapReduce-based parallel GEP algorithm for efficient function mining in big data applications
    Liu, Yang
    Ma, Chenxiao
    Xu, Lixiong
    Shen, Xiaodong
    Li, Maozhen
    Li, Pengcheng
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [9] Verifying Properties of MapReduce-Based Big Data Processing
    Zhang, Nan
    Wang, Meng
    Duan, Zhenhua
    Tian, Cong
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (01) : 321 - 338
  • [10] MapReduce-based storage and indexing for big health data
    Gayathiri, N. R.
    Natarajan, A. M.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):