Semi-supervised sentiment clustering on natural language texts

被引:0
|
作者
Frigau, Luca [1 ]
Romano, Maurizio [1 ]
Ortu, Marco [1 ]
Contu, Giulia [1 ]
机构
[1] Univ Cagliari, Dept Econ & Business Sci, Viale St Ignazio 17, I-09123 Cagliari, Italy
来源
STATISTICAL METHODS AND APPLICATIONS | 2023年 / 32卷 / 04期
关键词
Tb-NB; NeSSC; Reviews; Tourism data; Booking; com; CLASSIFICATION; FEATURES;
D O I
10.1007/s10260-023-00691-4
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we propose a semi-supervised method to cluster unstructured textual data called semi-supervised sentiment clustering on natural language texts. The aim is to identify clusters homogeneous with respect to the overall sentiment of the texts analyzed. The method combines different techniques and methodologies: Sentiment Analysis, Threshold-based Naive Bayes classifier, and Network-based Semi-supervised Clustering. It involves different steps. In the first step, the unstructured text is transformed into structured text, and it is categorized into positive or negative classes using a sentiment analysis algorithm. In the second step, the Threshold-based Naive Bayes classifier is applied to identify the overall sentiment of the texts and to define a specific sentiment value for the topics. In the last step, Network-based Semi-supervised Clustering is applied to partition the instances into disjoint groups. The proposed algorithm is tested on a collection of reviews written by customers on Booking.com. The results have highlighted the capacity of the proposed algorithm to identify clusters that are distinct, non-overlapped, and homogeneous with respect to the overall sentiment. Results are also easily interpretable thanks to the network representation of the instances that helps to understand the relationship between them.
引用
收藏
页码:1239 / 1257
页数:19
相关论文
共 50 条
  • [1] Semi-supervised sentiment clustering on natural language texts
    Luca Frigau
    Maurizio Romano
    Marco Ortu
    Giulia Contu
    [J]. Statistical Methods & Applications, 2023, 32 : 1239 - 1257
  • [2] Correction to: Semi‑supervised sentiment clustering on natural language
    Luca Frigau
    Maurizio Romano
    Marco Ortu
    Giulia Contu
    [J]. Statistical Methods & Applications, 2023, 32 (4) : 1379 - 1380
  • [3] Semi-supervised sentiment clustering on natural language (Apr, 10.1007/10260-023-00691-4, 2023)
    Frigau, Luca
    Romano, Maurizio
    Ortu, Marco
    Contu, Giulia
    [J]. STATISTICAL METHODS AND APPLICATIONS, 2023, 32 (04): : 1379 - 1380
  • [4] LJST: A Semi-supervised Joint Sentiment-Topic Model for Short Texts
    Sengupta A.
    Roy S.
    Ranjan G.
    [J]. SN Computer Science, 2021, 2 (4)
  • [5] Dual Learning for Semi-Supervised Natural Language Understanding
    Zhu, Su
    Cao, Ruisheng
    Yu, Kai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1936 - 1947
  • [6] Semi-supervised clustering methods
    Bair, Eric
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (05): : 349 - 361
  • [7] SEMI-SUPERVISED SPECTRAL CLUSTERING
    Mai, Xiaoyi
    Couillet, Romain
    [J]. 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 2012 - 2016
  • [8] A review on semi-supervised clustering
    Cai, Jianghui
    Hao, Jing
    Yang, Haifeng
    Zhao, Xujun
    Yang, Yuqing
    [J]. INFORMATION SCIENCES, 2023, 632 : 164 - 200
  • [9] Semi-Stacking for Semi-supervised Sentiment Classification
    Li, Shoushan
    Huang, Lei
    Wang, Jingjing
    Zhou, Guodong
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 27 - 31