Social Media Analysis using Optimized K-Means Clustering

被引:0
|
作者
Alsayat, Ahmed [1 ]
El-Sayed, Hoda [1 ]
机构
[1] Bowie State Univ, Dept Comp Sci, Bowie, MD 20715 USA
关键词
K-Means; Genetic Algorithm; Clustering; Social Media Analysis; DataMining;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The increasing influence of social media and enormous participation of users creates new opportunities to study human social behavior along with the capability to analyze large amount of data streams. One of the interesting problems is to distinguish between different kinds of users, for example users who are leaders and introduce new issues and discussions on social media. Furthermore, positive or negative attitudes can also be inferred from those discussions. Such problems require a formal interpretation of social media logs and unit of information that can spread from person to person through the social network. Once the social media data such as user messages are parsed and network relationships are identified, data mining techniques can be applied to group different types of communities. However, the appropriate granularity of user communities and their behavior is hardly captured by existing methods. In this paper, we present a framework for the novel task of detecting communities by clustering messages from large streams of social data. Our framework uses K-Means clustering algorithm along with Genetic algorithm and Optimized Cluster Distance (OCD) method to cluster data. The goal of our proposed framework is twofold that is to overcome the problem of general K-Means for choosing best initial centroids using Genetic algorithm, as well as to maximize the distance between clusters by pairwise clustering using OCD to get an accurate clusters. We used various cluster validation metrics to evaluate the performance of our algorithm. The analysis shows that the proposed method gives better clustering results and provides a novel use-case of grouping user communities based on their activities. Our approach is optimized and scalable for real-time clustering of social media data.
引用
收藏
页码:61 / 66
页数:6
相关论文
共 50 条
  • [1] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    [J]. JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
  • [2] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    [J]. The Journal of Supercomputing, 2014, 70 : 1249 - 1259
  • [3] An Optimized Version of the K-Means Clustering Algorithm
    Poteras, Cosmin Marian
    Mihaescu, Marian Cristian
    Mocanu, Mihai
    [J]. FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2014, 2014, 2 : 695 - 699
  • [4] Crime Analysis using k-means Clustering
    Joshi, Anant
    Sabitha, A. Sai
    Choudhury, Tanupriya
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NETWORKS (CINE), 2017, : 33 - 39
  • [5] An Integrated Clustering Framework Using Optimized K-means with Firefly and Canopies
    Nayak, S.
    Panda, C.
    Xalxo, Z.
    Behera, H. S.
    [J]. COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 2, 2015, 32 : 333 - 343
  • [6] Optimized data fusion for K-means Laplacian clustering
    Yu, Shi
    Liu, Xinhai
    Tranchevent, Leon-Charles
    Glanzel, Wolfgang
    Suykens, Johan A. K.
    De Moor, Bart
    Moreau, Yves
    [J]. BIOINFORMATICS, 2011, 27 (01) : 118 - 126
  • [7] An Analysis of DRR Suggestions Using K-means Clustering
    Go Bui, Shelly Marie
    Gorro, Ken
    Angelo Aquino, Gio
    Jane Sabellano, Mary
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2017), 2017, : 76 - 80
  • [8] Optimized Data Fusion for Kernel k-Means Clustering
    Yu, Shi
    Tranchevent, Leon-Charles
    Liu, Xinhai
    Glanzel, Wolfgang
    Suykens, Johan A. K.
    De Moor, Bart
    Moreau, Yves
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) : 1031 - 1039
  • [9] ANALYSIS OF DUCTAL CARCINOMA USING K-MEANS CLUSTERING
    Vijayaraghavan, R.
    Eswari, C.
    Raajan, N. R.
    [J]. 2014 INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2014,
  • [10] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163