Evaluating Tag Quality for Blogger Modelling via Topic Models

被引:0
|
作者
Shan, Lili [1 ]
Sun, Chengjie [1 ]
Lin, Lei [1 ]
Liu, Ming [1 ]
Wang, Xiaolong [1 ]
Liu, Bingquan [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
关键词
tag quality evaluation; blog representation; semantic similarity; topic model;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
with the permission of annotating blog posts with tags, tags has become one of the most important resources used to describe blogger features. However, due to the irregular quality of tags, not all tags are appropriate for representing blogger's preferences. Poor tags or spam tags confuse the actual user's preferences and spam terms, thus they should be detected before they are directly used to tag bloggers. A detailed quantitative analysis on the categories of tag spam in the blogosphere is presented in this paper. Taking advantage of abundant text contents in blog posts and the relatively stable semantic relationship between tags and their target posts, an unsupervised approach based on topic models is proposed to evaluate tag quality for blogger modelling in the blogosphere. The latent interest topics of a blogger are mined out through Latent Dirichlet Allocation (LDA) topic modeling. The blog post of the blogger is represented as a distribution over latent topics and a latent topic is a distribution over words of the vocabulary. A tag is also expressed as a specific co-occurrence term vector. Ultimately, a scheme is devised to determine the similarity between each tag and its target blog post. Then the tags with less similarity value can be identified as poor tag. The experimental results indicate that the proposed method achieves more promising performance than the baselines on datasets collected from Sina Blog, which is one of the biggest Chinese blogs.
引用
收藏
页码:1770 / 1776
页数:7
相关论文
共 50 条
  • [21] Tagged Image Clustering via Topic Models
    Cui, Junjun
    Liu, Lizhen
    Wang, Hanshi
    Du, Chao
    Song, Wei
    2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2015, : 4424 - 4429
  • [22] Service quality evaluating models
    Tohidi, Hamid
    Jabbari, Mohammad Mehdi
    WORLD CONFERENCE ON LEARNING, TEACHING & ADMINISTRATION - 2011, 2012, 31 : 861 - 865
  • [23] Evaluating the quality of reference models
    Misic, VB
    Zhao, JL
    CONCEPTUAL MODELING ER 2000, PROCEEDINGS, 2000, 1920 : 484 - 498
  • [24] Research on Mining Common Concern via Infinite Topic Modelling
    Miao, Yishu
    Li, Chunping
    Ding, Qiang
    Li, Li
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY WORKSHOPS (WI-IAT WORKSHOPS 2012), VOL 3, 2012, : 180 - 184
  • [25] Addressing topic modelling via reduced latent space clustering
    Schiavon, Lorenzo
    STATISTICAL METHODS AND APPLICATIONS, 2025,
  • [26] Prediction Focused Topic Models via Feature Selection
    Ren, Jason
    Kunes, Russell
    Doshi-Velez, Finale
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 4420 - 4428
  • [27] Model Selection for Topic Models via Spectral Decomposition
    Cheng, Dehua
    He, Xinran
    Liu, Yan
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 183 - 191
  • [28] Evaluating stochastic rainfall models for hydrological modelling
    Nguyen, Thien Huy Truong
    Bennett, Bree
    Leonard, Michael
    JOURNAL OF HYDROLOGY, 2023, 627
  • [29] Quality Assessment of Wikipedia Content Using Topic Models
    Santos, Lauro C. J.
    Christofani, Tais
    Silva, Ismael S.
    Dalip, Daniel H.
    WEBMEDIA 2019: PROCEEDINGS OF THE 25TH BRAZILLIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB, 2019, : 249 - 252
  • [30] Investigation of the Quality of Topic Models for Noisy Data Sources
    Geeganage, Dakshi T. Kapugamam
    Xu, Yue
    Li, Yuefeng
    2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 488 - 493