Semi-supervised sentiment clustering on natural language texts

被引:0
|
作者
Frigau, Luca [1 ]
Romano, Maurizio [1 ]
Ortu, Marco [1 ]
Contu, Giulia [1 ]
机构
[1] Univ Cagliari, Dept Econ & Business Sci, Viale St Ignazio 17, I-09123 Cagliari, Italy
来源
STATISTICAL METHODS AND APPLICATIONS | 2023年 / 32卷 / 04期
关键词
Tb-NB; NeSSC; Reviews; Tourism data; Booking; com; CLASSIFICATION; FEATURES;
D O I
10.1007/s10260-023-00691-4
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we propose a semi-supervised method to cluster unstructured textual data called semi-supervised sentiment clustering on natural language texts. The aim is to identify clusters homogeneous with respect to the overall sentiment of the texts analyzed. The method combines different techniques and methodologies: Sentiment Analysis, Threshold-based Naive Bayes classifier, and Network-based Semi-supervised Clustering. It involves different steps. In the first step, the unstructured text is transformed into structured text, and it is categorized into positive or negative classes using a sentiment analysis algorithm. In the second step, the Threshold-based Naive Bayes classifier is applied to identify the overall sentiment of the texts and to define a specific sentiment value for the topics. In the last step, Network-based Semi-supervised Clustering is applied to partition the instances into disjoint groups. The proposed algorithm is tested on a collection of reviews written by customers on Booking.com. The results have highlighted the capacity of the proposed algorithm to identify clusters that are distinct, non-overlapped, and homogeneous with respect to the overall sentiment. Results are also easily interpretable thanks to the network representation of the instances that helps to understand the relationship between them.
引用
收藏
页码:1239 / 1257
页数:19
相关论文
共 50 条
  • [31] Fast semi-supervised evidential clustering
    Antoine, Violaine
    Guerrero, Jose A.
    Xie, Jiarui
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 133 : 116 - 132
  • [32] Semi-supervised Power Iteration Clustering
    Yang, Yuqi
    Bie, Rongfang
    Wu, Hao
    Xu, Shuaijing
    Li, Liangchi
    [J]. 2018 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS, 2019, 147 : 588 - 595
  • [33] A Semi-supervised Clustering for Incomplete Data
    Goel, Sonia
    Tushir, Meena
    [J]. APPLICATIONS OF ARTIFICIAL INTELLIGENCE TECHNIQUES IN ENGINEERING, SIGMA 2018, VOL 1, 2019, 698 : 323 - 331
  • [34] Evolutionary semi-supervised fuzzy clustering
    Liu, H
    Huang, ST
    [J]. PATTERN RECOGNITION LETTERS, 2003, 24 (16) : 3105 - 3113
  • [35] Semi-Supervised Clustering with Neural Networks
    Shukla, Ankita
    Cheema, Gullal S.
    Anand, Saket
    [J]. 2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2020), 2020, : 152 - 161
  • [36] Active semi-supervised fuzzy clustering
    Grira, Nizar
    Crucianu, Michel
    Boujemaa, Nozha
    [J]. PATTERN RECOGNITION, 2008, 41 (05) : 1834 - 1844
  • [37] Semi-supervised hierarchical clustering algorithms
    Amar, A
    Labzour, NT
    Bensaid, A
    [J]. SIXTH SCANDINAVIAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 1997, 40 : 232 - 239
  • [38] SemiSync: Semi-supervised Clustering by Synchronization
    Zhang, Zhong
    Kang, Didi
    Gao, Chongming
    Shao, Junming
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 358 - 362
  • [39] Weighted Semi-supervised Fuzzy Clustering
    Kong, Yi-qing
    Wang, Shi-tong
    [J]. FUZZY INFORMATION AND ENGINEERING, VOL 1, 2009, 54 : 465 - 470
  • [40] A SUPERVISORY APPROACH TO SEMI-SUPERVISED CLUSTERING
    Conroy, Bryan
    Xi, Yongxin Taylor
    Ramadge, Peter
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 1858 - 1861