Word Embedding based Clustering to Detect Topics in Social Media

被引:28
|
作者
Comito, Carmela [1 ]
Forestiero, Agostino [1 ]
Pizzuti, Clara [1 ]
机构
[1] Nat Res Council Italy CNR, Inst High Performance Comp & Networking ICAR, Arcavacata Di Rende, Italy
关键词
Social Media; Topic Detection; Word Embedding; Clustering;
D O I
10.1145/3350546.3352518
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social media are playing an increasingly important role in reporting major events happening in the world. However, detecting events and topics of interest from social media is a challenging task due to the huge magnitude of the data and the complex semantics of the language being processed. The paper proposes an online algorithm to discover topics that incrementally groups short text by incorporating the textual content with latent feature vector representations of words appearing in the text, trained on very large corpora to improve the check-in topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, the approach obtains significant improvements with respect to classical topic detection methods.
引用
收藏
页码:192 / 199
页数:8
相关论文
共 50 条
  • [1] Word Embedding Based Event Detection on Social Media
    Ertugrul, Ali Mert
    Velioglu, Burak
    Karagoz, Pinar
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017, 2017, 10334 : 3 - 14
  • [2] From Social Media to Public Health Surveillance: Word Embedding based Clustering Method for Twitter Classification
    Dai, Xiangfeng
    Bikdash, Marwan
    Meyer, Bradley
    [J]. SOUTHEASTCON 2017, 2017,
  • [3] Evolution Analysis of Topics on Social Media Based on the Co-word Network
    Chen Zhuoqun
    Sun Xu
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL SEMINAR ON EDUCATION INNOVATION AND ECONOMIC MANAGEMENT (SEIEM 2016), 2016, 75 : 507 - 510
  • [4] AN EMOTIONAL ANALYSIS OF KOREAN TOPICS BASED ON SOCIAL MEDIA BIG DATA CLUSTERING
    Jin, Yanhong
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (01): : 115 - 126
  • [5] The Pausing Method Based on Brown Clustering and Word Embedding
    Kaliyev, Arman
    Rybin, Sergey V.
    Matveev, Yuri
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 741 - 747
  • [6] Data mining method of social media hot topics based on time series clustering
    Wang, Wei
    [J]. International Journal of Web Based Communities, 2024, 20 (1-2) : 153 - 163
  • [7] Short Text Clustering based on Word Semantic Graph with Word Embedding Model
    Jinarat, Supakpong
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    [J]. 2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1427 - 1432
  • [8] SSCDV: Social media document embedding with sentiment and topics for financial market forecasting
    Ueda, Kentaro
    Suwa, Hirohiko
    Yamada, Masaki
    Ogawa, Yuki
    Umehara, Eiichi
    Yamashita, Tatsuo
    Tsubouchi, Kota
    Yasumoto, Keiichi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245 (245)
  • [9] Contrastive Learning of Stress-specific Word Embedding for Social Media based Stress Detection
    Wang, Xin
    Zhang, Huijun
    Cao, Lei
    Zeng, Kaisheng
    Li, Qi
    Li, Ningyun
    Feng, Ling
    [J]. PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5137 - 5149
  • [10] Labelling Topics in Weibo Using Word Embedding and Graph-based Method
    Jin, Zhipeng
    Li, Qiudan
    Wang, Can
    Zeng, Daniel D.
    Wang, Lei
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS ENGINEERING (ICISE), 2016, : 34 - 37