Evaluating Tag Quality for Blogger Modelling via Topic Models

被引:0
|
作者
Shan, Lili [1 ]
Sun, Chengjie [1 ]
Lin, Lei [1 ]
Liu, Ming [1 ]
Wang, Xiaolong [1 ]
Liu, Bingquan [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
关键词
tag quality evaluation; blog representation; semantic similarity; topic model;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
with the permission of annotating blog posts with tags, tags has become one of the most important resources used to describe blogger features. However, due to the irregular quality of tags, not all tags are appropriate for representing blogger's preferences. Poor tags or spam tags confuse the actual user's preferences and spam terms, thus they should be detected before they are directly used to tag bloggers. A detailed quantitative analysis on the categories of tag spam in the blogosphere is presented in this paper. Taking advantage of abundant text contents in blog posts and the relatively stable semantic relationship between tags and their target posts, an unsupervised approach based on topic models is proposed to evaluate tag quality for blogger modelling in the blogosphere. The latent interest topics of a blogger are mined out through Latent Dirichlet Allocation (LDA) topic modeling. The blog post of the blogger is represented as a distribution over latent topics and a latent topic is a distribution over words of the vocabulary. A tag is also expressed as a specific co-occurrence term vector. Ultimately, a scheme is devised to determine the similarity between each tag and its target blog post. Then the tags with less similarity value can be identified as poor tag. The experimental results indicate that the proposed method achieves more promising performance than the baselines on datasets collected from Sina Blog, which is one of the biggest Chinese blogs.
引用
收藏
页码:1770 / 1776
页数:7
相关论文
共 50 条
  • [1] Tripartite Hidden Topic Models for Personalised Tag Suggestion
    Harvey, Morgan
    Baillie, Mark
    Ruthven, Ian
    Carman, Mark
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2010, 5993 : 432 - +
  • [2] Evaluating Topic Quality with Posterior Variability
    Xing, Linzi
    Paul, Michael J.
    Carenini, Giuseppe
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3471 - 3477
  • [4] Improving and Evaluating Topic Models and Other Models of Text
    Airoldi, Edoardo M.
    Bischof, Jonathan M.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (516) : 1381 - 1403
  • [5] Evaluating a Topic Modelling Approach to Measuring Corpus Similarity
    Fothergill, Richard
    Cook, Paul
    Baldwin, Timothy
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 273 - 279
  • [6] An Approach for Evaluating Topic Models for Knowledge Management
    Sumpter, Ashley Simone Kelsey
    Pines, Edward
    2024 15TH INTERNATIONAL CONFERENCE ON MECHANICAL AND INTELLIGENT MANUFACTURING TECHNOLOGIES, ICMIMT 2024, 2024, : 46 - 51
  • [7] Evaluating Thesaurus-Based Topic Models
    Loukachevitch, Natalia
    Ivanov, Kirill
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 : 364 - 376
  • [8] Evaluating Interactive Topic Models in Applied Settings
    Gao, Sally
    Norkute, Milda
    Agrawal, Abhinav
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
  • [9] Improving and Evaluating Topic Models and Other Models of Text Comment
    Taddy, Matt
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (516) : 1403 - 1405
  • [10] Evaluating Topic Quality using Model Clustering
    Mehta, Vineet
    Caceres, Rajmonda S.
    Carter, Kevin M.
    2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM), 2014, : 178 - 185