Opinion mining on large scale data using sentiment analysis and k-means clustering

被引:43
|
作者
Riaz, Sumbal [1 ]
Fatima, Mehvish [1 ]
Kamran, M. [1 ]
Nisar, M. Wasif [1 ]
机构
[1] COMSATS Inst Informat Technol, Dept Comp Sci, Wah Cantt, Pakistan
关键词
Heterogeneous data processing; Imbalanced learning; Intelligent computing; CLASSIFICATION; ALGORITHMS; LEXICON; WORDS;
D O I
10.1007/s10586-017-1077-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid growth of web technology and easy access of internet, online shopping has been increased. Now people express their opinions and share their experiences that greatly influence new buyers for purchasing products, thereby generating large data sets. This large data is very helpful for analyzing customer preference, needs and its behavior toward a product. Companies face the challenge of analyzing this sheer amount of data to extract customer opinion. To address this challenge, in this paper, we performed sentiment analysis on the customer review real-world data at phrase level to find out customer preference by analyzing subjective expressions. Then we calculated the strength of sentiment word to find out the intensity of each expression and applied clustering for placing the words in various clusters based on their intensity. We also compared the results of our technique with star-ranking given on the same dataset and found the drastic change in our results. We also provide a visual representation of our results to provide a clear insight of customer preference and behavior to help decision makers for better decision making.
引用
收藏
页码:S7149 / S7164
页数:16
相关论文
共 50 条
  • [31] K-Means Clustering With Incomplete Data
    Wang, Siwei
    Li, Miaomiao
    Hu, Ning
    Zhu, En
    Hu, Jingtao
    Liu, Xinwang
    Yin, Jianping
    IEEE ACCESS, 2019, 7 : 69162 - 69171
  • [32] k-Means Clustering of Asymmetric Data
    Olszewski, Dominik
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT I, 2012, 7208 : 243 - 254
  • [33] Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets
    Kumar, Jitendra
    Mills, Richard T.
    Hoffman, Forrest M.
    Hargrove, William W.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 1602 - 1611
  • [34] Analysis of K-Means Clustering Algorithm: A Case Study Using Large Scale E-Commerce Products
    Mathivanan, Norsyela Muhammad Noor
    Ghani, Nor Azura Md
    Janor, Roziah Mohd
    2019 IEEE CONFERENCE ON BIG DATA AND ANALYTICS (ICBDA), 2019, : 41 - 44
  • [35] Characterization of the Power Quality in the Electric Distribution Networks Using Data Mining with K-Means Clustering
    Galbau, S.
    Grigoras, G.
    Neagu, B.
    Scarlatache, F.
    Lucache, D.
    Hustiuc, V
    2022 10TH INTERNATIONAL CONFERENCE ON SYSTEMS AND CONTROL (ICSC), 2022, : 131 - 136
  • [36] Clustering large datasets using Cobweb and K-means in tandem
    Li, M
    Holmes, G
    Pfahringer, B
    AI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3339 : 368 - 379
  • [37] Clustering Analysis of Multidimensional Wind Speed Data Using k-Means Approach
    Yesilbudak, Mehmet
    2016 IEEE INTERNATIONAL CONFERENCE ON RENEWABLE ENERGY RESEARCH AND APPLICATIONS (ICRERA), 2016, : 961 - 965
  • [38] Ensemble clustering using extended fuzzy k-means for cancer data analysis
    Khan, Imran
    Luo, Zongwei
    Shaikh, Abdul Khalique
    Hedjam, Rachid
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 172 (172)
  • [39] An Analysis of DRR Suggestions Using K-means Clustering
    Go Bui, Shelly Marie
    Gorro, Ken
    Angelo Aquino, Gio
    Jane Sabellano, Mary
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2017), 2017, : 76 - 80
  • [40] ANALYSIS OF DUCTAL CARCINOMA USING K-MEANS CLUSTERING
    Vijayaraghavan, R.
    Eswari, C.
    Raajan, N. R.
    2014 INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2014,