Evaluating Tag Quality for Blogger Modelling via Topic Models

被引:0
|
作者
Shan, Lili [1 ]
Sun, Chengjie [1 ]
Lin, Lei [1 ]
Liu, Ming [1 ]
Wang, Xiaolong [1 ]
Liu, Bingquan [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
关键词
tag quality evaluation; blog representation; semantic similarity; topic model;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
with the permission of annotating blog posts with tags, tags has become one of the most important resources used to describe blogger features. However, due to the irregular quality of tags, not all tags are appropriate for representing blogger's preferences. Poor tags or spam tags confuse the actual user's preferences and spam terms, thus they should be detected before they are directly used to tag bloggers. A detailed quantitative analysis on the categories of tag spam in the blogosphere is presented in this paper. Taking advantage of abundant text contents in blog posts and the relatively stable semantic relationship between tags and their target posts, an unsupervised approach based on topic models is proposed to evaluate tag quality for blogger modelling in the blogosphere. The latent interest topics of a blogger are mined out through Latent Dirichlet Allocation (LDA) topic modeling. The blog post of the blogger is represented as a distribution over latent topics and a latent topic is a distribution over words of the vocabulary. A tag is also expressed as a specific co-occurrence term vector. Ultimately, a scheme is devised to determine the similarity between each tag and its target blog post. Then the tags with less similarity value can be identified as poor tag. The experimental results indicate that the proposed method achieves more promising performance than the baselines on datasets collected from Sina Blog, which is one of the biggest Chinese blogs.
引用
收藏
页码:1770 / 1776
页数:7
相关论文
共 50 条
  • [31] Assessment of the Quality of Topic Models for Information Retrieval Applications
    Yuan, Meng
    Lin, Pauline
    Rashidi, Lida
    Zobel, Justin
    PROCEEDINGS OF THE 2023 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2023, 2023, : 265 - 274
  • [32] Out-of-vocabulary handling and topic quality control strategies in streaming topic models
    Nguyen, Tung
    Pham, Tung
    Van, Linh Ngo
    Ban, Ha-Bang
    Than, Khoat
    NEUROCOMPUTING, 2025, 614
  • [33] VIDEO QUALITY ASSESSMENT VIA SUPERVISED TOPIC MODEL
    Guo, Qun
    Lu, Xiaoqiang
    Yuan, Yuan
    2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 636 - 640
  • [34] Evaluating the Robustness of Embedding-Based Topic Models to OCR Noise
    Zosa, Elaine
    Mutuvi, Stephen
    Granroth-Wilding, Mark
    Doucet, Antoine
    TOWARDS OPEN AND TRUSTWORTHY DIGITAL SOCIETIES, ICADL 2021, 2021, 13133 : 392 - 400
  • [35] Methods for Evaluating the Quality of Process Modelling Tools
    Pavlicek, Josef
    Pavlickova, Petra
    ENTERPRISE AND ORGANIZATIONAL MODELING AND SIMULATION, EOMAS 2018, 2018, 332 : 171 - 177
  • [36] Evaluating the quality of entity relationship models
    Kesh, S
    INFORMATION AND SOFTWARE TECHNOLOGY, 1995, 37 (12) : 681 - 689
  • [37] Evaluating Predictive Models of Software Quality
    Ciaschini, V.
    Canaparo, M.
    Ronchieri, E.
    Salomoni, D.
    20TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2013), PARTS 1-6, 2014, 513
  • [38] A macro perspective of the perceptions of the education system via topic modelling analysis
    Cifuentes, Jenny
    Olarte, Fredy
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (02) : 1783 - 1820
  • [39] A macro perspective of the perceptions of the education system via topic modelling analysis
    Jenny Cifuentes
    Fredy Olarte
    Multimedia Tools and Applications, 2023, 82 : 1783 - 1820
  • [40] Topic modelling for wildlife tourism online reviews: analysis of quality factors
    Shang, Ziye
    Luo, Jian Ming
    CURRENT ISSUES IN TOURISM, 2023, 26 (14) : 2317 - 2331