Spam Comments Detection with Self-Extensible Dictionary and Text-Based Features

被引:0
|
作者
Zhang, Qiang [1 ]
Liu, Chenwei [1 ]
Zhong, Shangru [1 ]
Lei, Kai [1 ]
机构
[1] Peking Univ, SECE, Shenzhen Key Lab Cloud Comp Technol & Applicat, Inst Big Data Technol, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
spam comments; spam dictionary; text-based features;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The new social media have become popular for information spreading, allowing online users to publish latest events and personal opinions. However, massive spam comments seriously decrease users' reading experience. To detect spam comments in Chinese social media, we employ semantic analysis to build the self-extensible dictionary which updates and extends itself with new cyber words automatically. The Semantic analysis brings extra semantic features which helps in text classification. Based on the statistical analysis of microblogging comments, we select four text-based features, which basically represent characteristics of Chinese spam comments. We use spam dictionary and text-based features to construct classifiers for detecting spam comments. Finally, we achieve an average detection accuracy of 93.6%, which is preferable to existing spam comments detection methods. Experimental results demonstrate that our method can effectively detect spam comments in Chinese microblogging field.
引用
收藏
页码:1225 / 1230
页数:6
相关论文
共 50 条
  • [1] Detecting Spam Comments Posted in Micro-Blogs Using the Self-Extensible Spam Dictionary
    Liu, Chenwei
    Wang, Jiawei
    Lei, Kai
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016,
  • [2] Spam detection proposal in regular and text-based image emails
    Issac, Biju
    Raman, Valliappan
    [J]. TENCON 2006 - 2006 IEEE REGION 10 CONFERENCE, VOLS 1-4, 2006, : 1624 - +
  • [3] Text-Based Spam Tweets Detection Using Neural Networks
    Mardi, Vanyashree
    Kini, Anvaya
    Sukanya, V. M.
    Rachana, S.
    [J]. ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 401 - 408
  • [4] A Supervised Approach for Spam Detection Using Text-Based Semantic Representation
    Saidani, Nadjate
    Adi, Kamel
    Allili, Mouhand Said
    [J]. E-TECHNOLOGIES: EMBRACING THE INTERNET OF THINGS, MCETECH 2017, 2017, 289 : 136 - 148
  • [5] Hinky: Defending Against Text-based Message Spam on Smartphones
    Lahmadi, Abdelkader
    Delosiere, Laurent
    Festor, Olivier
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2011,
  • [6] Ensembles for Text-Based Sarcasm Detection
    Po Hung, Lai
    Jia Yu, Chan
    Kim On, Chin
    [J]. 19TH IEEE STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED 2021), 2021, : 284 - 289
  • [7] Integration of manual and automatic text categorization. A categorization workbench for text-based email and spam
    Sun, Q
    Schommer, C
    Lang, A
    [J]. KI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3238 : 156 - 167
  • [8] Using YouTube comments for text-based emotion recognition
    Yasmina, Douiji
    Hajar, Mousannif
    Hassan, Al Moatassime
    [J]. 7TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2016) / THE 6TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2016) / AFFILIATED WORKSHOPS, 2016, 83 : 292 - 299
  • [9] Text-Based Detection of the Risk of Depression
    Havigerova, Jana M.
    Haviger, Jiri
    Kucera, Dalibor
    Hoffmannova, Petra
    [J]. FRONTIERS IN PSYCHOLOGY, 2019, 10
  • [10] A Web Spam Link Detection Method Based on Web Page Structure and Text Features
    Yang, Wang
    Jiang, Yong-Han
    Zhang, San-Feng
    [J]. Dongbei Daxue Xuebao/Journal of Northeastern University, 2020, 41 (08): : 1091 - 1096