A Web Spam Link Detection Method Based on Web Page Structure and Text Features

被引:0
|
作者
Yang, Wang [1 ]
Jiang, Yong-Han [1 ]
Zhang, San-Feng [1 ]
机构
[1] School of Cyber Science, Southeast University, Nanjing,211189, China
关键词
Machine learning;
D O I
10.12068/j.issn.1005-3026.2020.08.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The existing spam website detection methods are mainly aimed at self-built spam websites, and not suitable for injected spam websites because of the low efficiency of link detection. This paper proposes a new detection method, in which a detection framework is based on multi-dimensional features of webpage structure and text. The framework divides the webpage into blocks. Then content features are extracted by calculating odd ratio and structural features based on tags, attribute keys and attribute values are extracted by using the one-hot rate. The detection model is generated by proper machine learning and used to detect spam links. The detection accuracy of this framework is increased by up to 13%, compared with the algorithms based on content detection and on blacklist matching. © 2020, Editorial Department of Journal of Northeastern University. All right reserved.
引用
收藏
页码:1091 / 1096
相关论文
共 50 条
  • [1] Spam web page detection using combined content and link features
    Roul, Rajendra Kumar
    Asthana, Shubham Rohan
    Kumar, Gaurav
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2016, 8 (03) : 209 - 222
  • [2] Web Spam Detection Based On Link Diversity and Content Features
    Xu Gongwen
    Li Xiaomei
    Zhang Zhijun
    Xu Li'Na
    [J]. INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2016, 10 (07): : 363 - 372
  • [3] Web spam detection based on discriminative content and link features
    Mahmoudi, Maryam
    Yari, Alireza
    Khadivi, Shahram
    [J]. 2010 5th International Symposium on Telecommunications, IST 2010, 2010, : 542 - 546
  • [4] Web Spam: a Study of the Page Language Effect on the Spam Detection Features
    Alarifi, Abdulrahman
    Alsaleh, Mansour
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 216 - 221
  • [5] The Research of Spam Web Page Detection Method Based on Web Page Differentiation and Concrete Cluster Centers
    Yu, Mei
    Zhang, Jie
    Wang, Jianrong
    Gao, Jie
    Xu, Tianyi
    Yu, Ruiguo
    [J]. WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS (WASA 2018), 2018, 10874 : 820 - 826
  • [6] Detecting Web Spam Based on Novel Features from Web Page Source Code
    Liu, Jiayong
    Su, Yu
    Lv, Shun
    Huang, Cheng
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2020, 2020
  • [7] Detection of spam web page using content and link-based techniques: A combined approach
    Roul, Rajendra Kumar
    Asthana, Shubham Rohan
    Shah, Mit
    Parikh, Dhruvesh
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2016, 41 (02): : 193 - 202
  • [8] Link Analysis for Web Spam Detection
    Becchetti, Luca
    Castillo, Carlos
    Donato, Debora
    Baeza-Yates, Ricardo
    Leonardi, Stefano
    [J]. ACM TRANSACTIONS ON THE WEB, 2008, 2 (01)
  • [9] Improving SVM classifiers with link structure for web spam detection
    [J]. Zhang, H. (824223485@163.com), 1600, Binary Information Press (10):
  • [10] Novel Features for Web Spam Detection
    Kumar, Santosh
    Gao, Xiaoying
    Welch, Ian
    [J]. 2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 593 - 597