Evaluating Tag Quality for Blogger Modelling via Topic Models

被引:0
|
作者
Shan, Lili [1 ]
Sun, Chengjie [1 ]
Lin, Lei [1 ]
Liu, Ming [1 ]
Wang, Xiaolong [1 ]
Liu, Bingquan [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
关键词
tag quality evaluation; blog representation; semantic similarity; topic model;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
with the permission of annotating blog posts with tags, tags has become one of the most important resources used to describe blogger features. However, due to the irregular quality of tags, not all tags are appropriate for representing blogger's preferences. Poor tags or spam tags confuse the actual user's preferences and spam terms, thus they should be detected before they are directly used to tag bloggers. A detailed quantitative analysis on the categories of tag spam in the blogosphere is presented in this paper. Taking advantage of abundant text contents in blog posts and the relatively stable semantic relationship between tags and their target posts, an unsupervised approach based on topic models is proposed to evaluate tag quality for blogger modelling in the blogosphere. The latent interest topics of a blogger are mined out through Latent Dirichlet Allocation (LDA) topic modeling. The blog post of the blogger is represented as a distribution over latent topics and a latent topic is a distribution over words of the vocabulary. A tag is also expressed as a specific co-occurrence term vector. Ultimately, a scheme is devised to determine the similarity between each tag and its target blog post. Then the tags with less similarity value can be identified as poor tag. The experimental results indicate that the proposed method achieves more promising performance than the baselines on datasets collected from Sina Blog, which is one of the biggest Chinese blogs.
引用
收藏
页码:1770 / 1776
页数:7
相关论文
共 50 条
  • [41] The role of hyper-parameters in relational topic models: Prediction capabilities vs topic quality
    Terragni, Silvia
    Candelieri, Antonio
    Fersini, Elisabetta
    INFORMATION SCIENCES, 2023, 632 : 252 - 268
  • [42] ASSESSMENT OF THE APPLIED QUALITY OF TOPIC MODELS FOR CLUSTERING PROBLEMS.
    Krasnov, F., V
    Baskakova, E. N.
    Smaznevich, I. S.
    VESTNIK TOMSKOGO GOSUDARSTVENNOGO UNIVERSITETA-UPRAVLENIE VYCHISLITELNAJA TEHNIKA I INFORMATIKA-TOMSK STATE UNIVERSITY JOURNAL OF CONTROL AND COMPUTER SCIENCE, 2021, (56): : 100 - 111
  • [43] Evaluating quality of enterprise modelling languages: The UEML solution
    Anaya, V.
    Berio, G.
    Verdecho, M. J.
    ENTERPRISE INTEROPERABILITY II: NEW CHALLENGES AND APPROACHES, 2007, : 237 - 240
  • [44] Evaluating the quality of process models: Empirical testing of a quality framework
    Moody, DL
    Sindre, G
    Brasethvik, T
    Solvberg, A
    CONCEPTUAL MODELING - ER 2002, 2002, 2503 : 380 - 396
  • [45] Evaluating the quality of reporting of melanoma prediction models
    Jiang, Matthew Y.
    Dragnev, Nathalie C.
    Wong, Sandra L.
    SURGERY, 2020, 168 (01) : 173 - 177
  • [46] New Quality Metrics for Evaluating Process Models
    Huang, Zan
    Kumar, Akhil
    BUSINESS PROCESS MANAGEMENT WORKSHOPS, 2009, 17 : 164 - 170
  • [47] A FRAMEWORK FOR EVALUATING AIR-QUALITY MODELS
    VENKATRAM, A
    BOUNDARY-LAYER METEOROLOGY, 1982, 24 (03) : 371 - 385
  • [48] Content analysis of psychological first aid training manuals via topic modelling
    Ni, Chung-Fan
    Lundblad, Robert
    Dykeman, Cass
    Bolante, Rebecca
    Labunski, Wojciech
    EUROPEAN JOURNAL OF PSYCHOTRAUMATOLOGY, 2023, 14 (02)
  • [49] A Simple Enhancement for Ad-hoc Information Retrieval via Topic Modelling
    Jian, Fanghong
    Huang, Jimmy Xiangji
    Zhao, Jiashu
    He, Tingting
    Hu, Po
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 733 - 736
  • [50] Metre and Semantics in the Poetry of Czech PostSymbolists Accessed via LDA Topic Modelling
    Plechac, Petr
    Kolar, Robert
    STUDIA METRICA ET POETICA, 2022, 9 (01): : 7 - 19