The Research of Spam Web Page Detection Method Based on Web Page Differentiation and Concrete Cluster Centers

被引:4
|
作者
Yu, Mei [1 ,2 ,3 ,4 ]
Zhang, Jie [2 ,3 ,4 ]
Wang, Jianrong [1 ,2 ,3 ,4 ]
Gao, Jie [1 ,3 ,4 ]
Xu, Tianyi [1 ,3 ,4 ]
Yu, Ruiguo [1 ,2 ,3 ,4 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Tianjin Univ, Tianjin Int Engn Inst, Tianjin, Peoples R China
[3] Tianjin Key Lab Adv Networking TANK Lab, Tianjin, Peoples R China
[4] Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
关键词
Web page differentiation; Concrete cluster center; Spam web page detection; PageRank algorithm; K-Means algorithm;
D O I
10.1007/978-3-319-94268-1_73
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To improve the PageRank algorithm's disadvantage of assigning link weights evenly and ignoring the authority of web page, we propose an improved PageRank algorithm based on web page differentiation (DPR) which evaluate pages authority according it's links' numbers and assign corresponding weights according to its authoritativeness when assigning PR values. To improve the cluster's stability and accuracy of the K-Means algorithm, we combine DPR with K-Means, design a differentiation page-based K-Means (DPK-Means) algorithm. This algorithm will sort the pages according to the PR value obtained by the DPR algorithm and then concrete cluster centers according to the current sorting result. Experiments show that in spam detection, the DPR is superior to PageRank in terms of pages numbers, recall rate, accuracy, and F-Measure value and DPK-Means has better performance than the K-Means.
引用
收藏
页码:820 / 826
页数:7
相关论文
共 50 条
  • [1] A Web Spam Link Detection Method Based on Web Page Structure and Text Features
    Yang, Wang
    Jiang, Yong-Han
    Zhang, San-Feng
    [J]. Dongbei Daxue Xuebao/Journal of Northeastern University, 2020, 41 (08): : 1091 - 1096
  • [2] Research on Spam Web Page Detection Based on Unbalanced Data Processing
    Yang, Xiaxia
    Huang, Xuxia
    Wang, Yanjun
    [J]. 2021 ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE (ACCTCS 2021), 2021, : 286 - 289
  • [3] Web Spam: a Study of the Page Language Effect on the Spam Detection Features
    Alarifi, Abdulrahman
    Alsaleh, Mansour
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 216 - 221
  • [4] A method for supporting web page design based on impression of web page
    Watanabe, M
    Yoshida, T
    Saiwaki, N
    Nishida, S
    [J]. IEEE RO-MAN 2000: 9TH IEEE INTERNATIONAL WORKSHOP ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, PROCEEDINGS, 2000, : 13 - 17
  • [5] Spam web page detection using combined content and link features
    Roul, Rajendra Kumar
    Asthana, Shubham Rohan
    Kumar, Gaurav
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2016, 8 (03) : 209 - 222
  • [6] Detecting Web Spam Based on Novel Features from Web Page Source Code
    Liu, Jiayong
    Su, Yu
    Lv, Shun
    Huang, Cheng
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2020, 2020
  • [7] Research on Web Page Classification Method Based on Query Log
    Ye F.
    Ma Y.
    [J]. Journal of Shanghai Jiaotong University (Science), 2018, 23 (3) : 404 - 410
  • [8] Research on Web Page Classification Method Based on Query Log
    叶飞跃
    马祎星
    [J]. Journal of Shanghai Jiaotong University(Science), 2018, 23 (03) : 404 - 410
  • [9] Predicting web page performance level based on web page characteristics
    Zhou, Junzan
    Zhang, Yun
    Zhou, Bo
    Li, Shanping
    [J]. International Journal of Web Engineering and Technology, 2015, 10 (02) : 152 - 169
  • [10] A Web Page Segmentation Method based on Page Layouts and Title Blocks
    Sano, Hiroyuki
    Shiramatsu, Shun
    Ozono, Tadachika
    Shintani, Toramatsu
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2011, 11 (10): : 84 - 90