Using Hashtag Graph-Based Topic Model to Connect Semantically-Related Words Without Co-Occurrence in Microblogs

被引:50
|
作者
Wang, Yuan [1 ,2 ]
Liu, Jie [1 ,2 ]
Huang, Yalou [1 ,2 ]
Feng, Xia [3 ]
机构
[1] Nankai Univ, Coll Comp & Control Engn, Tianjin 300071, Peoples R China
[2] Nankai Univ, Coll Software, Tianjin 300071, Peoples R China
[3] Civil Aviat Univ China, Informat Technol Res Base Civil Aviat Adm China, Tianjin 300071, Peoples R China
基金
中国国家自然科学基金;
关键词
Hashtag graph; topic modeling; sparseness of short text; weakly-supervised learning;
D O I
10.1109/TKDE.2016.2531661
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a new topic model to understand the chaotic microblogging environment by using hashtag graphs. Inferring topics on Twitter becomes a vital but challenging task in many important applications. The shortness and informality of tweets leads to extreme sparse vector representations with a large vocabulary. This makes the conventional topic models (e.g., latent Dirichlet allocation [1] and latent semantic analysis [2]) fail to learn high quality topic structures. Tweets are always showing up with rich user-generated hashtags. The hashtags make tweets semi-structured inside and semantically related to each other. Since hashtags are utilized as keywords in tweets to mark messages or to form conversations, they provide an additional path to connect semantically related words. In this paper, treating tweets as semi-structured texts, we propose a novel topic model, denoted as Hashtag Graph-based Topic Model (HGTM) to discover topics of tweets. By utilizing hashtag relation information in hashtag graphs, HGTM is able to discover word semantic relations even if words are not co-occurred within a specific tweet. With this method, HGTM successfully alleviates the sparsity problem. Our investigation illustrates that the user-contributed hashtags could serve as weakly-supervised information for topic modeling, and the relation between hashtags could reveal latent semantic relation between words. We evaluate the effectiveness of HGTM on tweet (hashtag) clustering and hashtag classification problems. Experiments on two real-world tweet data sets show that HGTM has strong capability to handle sparseness and noise problem in tweets. Furthermore, HGTM can discover more distinct and coherent topics than the state-of-the-art baselines.
引用
收藏
页码:1919 / 1933
页数:15
相关论文
共 14 条
  • [1] Land Cover Mapping with Higher Order Graph-Based Co-Occurrence Model
    Zhao, Wenzhi
    Emery, William J.
    Bo, Yanchen
    Chen, Jiage
    REMOTE SENSING, 2018, 10 (11)
  • [2] Entity Co-occurrence Graph-Based Clustering for Twitter Event Detection
    Manaskasemsak, Bundit
    Netsiwawichian, Natthakit
    Rungsawang, Arnon
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 2, AINA 2024, 2024, 200 : 344 - 355
  • [3] Topics identification based on event sequence using co-occurrence words
    Wakabayashi, Kei
    Miura, Takao
    NATURAL LANGUAGE AND INFORMATION SYSTEMS, PROCEEDINGS, 2008, 5039 : 219 - 225
  • [4] Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence
    Shao, Minglai
    Qin, Liangxi
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, KNOWLEDGE ENGINEERING AND INFORMATION ENGINEERING (SEKEIE 2014), 2014, 114 : 199 - 203
  • [5] Co-occurrence graph-based context adaptation: a new unsupervised approach to word sense disambiguation
    Rahmani, Saeed
    Fakhrahmad, Seyed Mostafa
    Sadreddini, Mohammad Hadi
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2021, 36 (02) : 449 - 471
  • [6] Extracting Tweets related to Disaster Information by using Multiple Co-occurrence Relation of Words
    Yuzawa, Akio
    Ichikawa, Hiroyoshi
    Kobayashi, Aki
    2018 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2018), 2018, : 321 - 326
  • [7] SOUND EVENT DETECTION USING GRAPH LAPLACIAN REGULARIZATION BASED ON EVENT CO-OCCURRENCE
    Imoto, Keisuke
    Kyochi, Seisuke
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1 - 5
  • [8] Topic segmentation model based on ATNLDA and co-occurrence theory and its application in stem cell field
    Wu, QingQiang
    Zhang, CaiDong
    An, XinYing
    JOURNAL OF INFORMATION SCIENCE, 2013, 39 (03) : 319 - 332
  • [9] Multimodal medical image fusion using adaptive co-occurrence filter-based decomposition optimization model
    Zhu, Rui
    Li, Xiongfei
    Huang, Sa
    Zhang, Xiaoli
    BIOINFORMATICS, 2022, 38 (03) : 818 - 826
  • [10] Quantitative analysis of ultrasonic images of fibrotic liver using co-occurrence matrix based on multi-Rayleigh model
    Isono, Hiroshi
    Hirata, Shinnosuke
    Hachiya, Hiroyuki
    JAPANESE JOURNAL OF APPLIED PHYSICS, 2015, 54 (07)