Identification of domain-specific euphemistic tweets using clustering

被引:0
|
作者
Devi M.D. [1 ]
Saharia N. [1 ]
机构
[1] Data Engineering Lab, Department of Computer Science & Engineering, IIIT Senapati, Manipur, Imphal
关键词
DBSCAN; Domain-specific euphemistic text; Euphemism; K-means; Silhouette score;
D O I
10.1007/s41870-023-01595-y
中图分类号
学科分类号
摘要
Social media platforms (SMPs) are frequently utilised as a readily accessible and comprehensive medium for expressing personal opinions nowdays. The use of euphemism, a linguistic strategy in which the underlying feeling of expressive content is veiled by the use of mild language, has been a longtime practise in the realm of SMPs for the purpose of reducing harshness or to discuss sensitive topics [1]. The identification of masked contents [2] in euphemism is challenging due to their inherent nature. This study presents a proposed identification mechanism aimed at detecting domain-specific euphemisms through the utilisation of clustering techniques. The pattern categorization feature is created utilising domain-specific lexical features combined with frequency-based features. In order to identify the most suitable match, the hybrid feature extraction algorithms incorporate uni-gram and bi-gram features dependent on frequency based feature, in conjunction with a lexicon. The objective of the dimension reduction phase is to address the issue of sparsity and to identify the most significant words for each sample in order to classify them into different domains using centroid and density-based clustering techniques. The DBSCAN algorithm is employed with an epsilon value of 2.5 and a minimum number of points set to 6, resulting in the identification of 7 distinct clusters. To calculate the optimal value for k in the K-means algorithm, the Silhouette score is utilised. The clusters that were obtained are examined by manual means. We compare our model to FLUTE dataset with epsilon value of 0.2, minpoints of 5 for DBSCAN, and obtain validation score of 0.55. The DBSCAN clustering algorithm generates distinct clusters that extend beyond the scope of the inquiry domain. © 2023, The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management.
引用
收藏
页码:21 / 31
页数:10
相关论文
共 50 条
  • [1] Research on Domain-Specific Features Clustering Based Spectral Clustering
    Yang, Xiquan
    Wang, Meijia
    Fang, Lin
    Yue, Lin
    Lv, Yinghua
    [J]. ADVANCES IN SWARM INTELLIGENCE, ICSI 2012, PT II, 2012, 7332 : 84 - 92
  • [2] BILROST: Handling Actuators of the Internet of Things through Tweets on Twitter using a Domain-Specific Language
    Meana-Llorian, Daniel
    Gonzalez Garcia, Cristian
    Pelayo G-Bustelo, B. Cristina
    Manuel Cueva-Lovelle, Juan
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 6 (06): : 133 - 144
  • [3] Identification of zinc finger mRNAs using domain-specific differential display
    Johnson, SW
    Lissy, NA
    Miller, PD
    Testa, JR
    Ozols, RF
    Hamilton, TC
    [J]. ANALYTICAL BIOCHEMISTRY, 1996, 236 (02) : 348 - 352
  • [4] Learning Domain-Specific Word Embeddings from COVID-19 Tweets
    Aigbe, Steve Aibuedefe
    Eick, Christoph
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 4307 - 4312
  • [5] Domain-Specific Identification of Topics and Trends in the Blogosphere
    Schirru, Rafael
    Obradovic, Darko
    Baumann, Stephan
    Wortmann, Peter
    [J]. ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS, 2010, 6171 : 490 - +
  • [6] Defining and Using Domain-Specific Languages
    Lyytinen, Kalle
    Welke, Richard
    [J]. IEEE SOFTWARE, 2010, 27 (01) : 8 - 8
  • [7] Using Ontologies in the Domain Analysis of Domain-Specific Languages
    Tairas, Robert
    Mernik, Marjan
    Gray, Jeff
    [J]. MODELS IN SOFTWARE ENGINEERING, 2009, 5421 : 332 - +
  • [8] Domain-specific model differencing for graphical domain-specific languages
    Jafarlou, Manouchehr Zadahmad
    [J]. ACM/IEEE 25TH INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS 2022 COMPANION, 2022, : 205 - 208
  • [9] Domain-specific application analysis for customized instruction identification
    [J]. Tian, Yu-Chu, 1600, Elsevier B.V., Netherlands (38):
  • [10] Relation Identification in Business Rules for Domain-specific Documents
    Bhattacharyya, Abhidip
    Chittimalli, Pavan Kumar
    Naik, Ravindra
    [J]. ISEC'18: PROCEEDINGS OF THE 11TH INNOVATIONS IN SOFTWARE ENGINEERING CONFERENCE, 2018,