Spammer Classification Using Ensemble Methods over Content-Based Features

被引:8
|
作者
Makkar, Aaisha [1 ]
Goel, Shivani [1 ]
机构
[1] Thapar Univ, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Web spamming; Machine learning; Boosting; Ensemble;
D O I
10.1007/978-981-10-3325-4_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the web documents are raising at high scale, it is very difficult to access useful information. Search engines play a major role in retrieval of relevant information and knowledge. They deal with managing large amount of information with efficient page ranking algorithms. Still web spammers try to intrude the search engine results by various web spamming techniques for their personal benefit. According to the recent report from Internetlivestats in March (2016), an Internet survey company, states that there are currently 3.4 billion Internet users in the world. From this survey it can be judged that the search engines play a vital role in retrieval of information. In this research, we have investigated fifteen different machine learning classification algorithms over content based features to classify the spam and non spam web pages. Ensemble approach is done by using three algorithms which are computed as best on the basis of various parameters. Ten Fold Cross-validation approach is also used.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [41] Content-Based Email Classification at Scale
    Early, Kirstin
    O'Hare, Neil
    LuVogt, Christopher
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4559 - 4566
  • [42] Features for Content-Based Audio Retrieval
    Mitrovic, Dalibor
    Zeppelzauer, Matthias
    Breiteneder, Christian
    [J]. ADVANCES IN COMPUTERS, VOL 78: IMPROVING THE WEB, 2010, 78 : 71 - 150
  • [43] A study on content-based music classification
    Zhang, YB
    Zhou, X
    [J]. SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS, 2003, : 113 - 116
  • [44] Content-based classification engine emerges
    Cravotta, N
    [J]. EDN, 2001, 46 (07) : 20 - 20
  • [45] Content-based search using term aggregation and classification over hybrid Peer-to-Peer systems
    Zhou, Aoying
    Zhang, Rong
    Vu, Quang Hieu
    Qian, Weining
    [J]. 2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 28 - +
  • [46] Combining Content-Based and Context-Based Methods for Persian Web Page Classification
    Farhoodi, Mojgan
    Yari, Alireza
    Mahmoudi, Maryam
    [J]. 2009 SECOND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009), 2009, : 399 - 404
  • [47] Segmentation of 3d medical images for detection and classification of lung tumor using content-based features
    Heidari, Maryam
    Mehrdad, Vahid
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 40939 - 40961
  • [48] A novel content-based image retrieval approach for classification using GLCM features and texture fused LBP variants
    Garg, Meenakshi
    Dhiman, Gaurav
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (04): : 1311 - 1328
  • [49] A novel content-based image retrieval approach for classification using GLCM features and texture fused LBP variants
    Meenakshi Garg
    Gaurav Dhiman
    [J]. Neural Computing and Applications, 2021, 33 : 1311 - 1328
  • [50] Segmentation of 3d medical images for detection and classification of lung tumor using content-based features
    Maryam Heidari
    Vahid Mehrdad
    [J]. Multimedia Tools and Applications, 2024, 83 : 40939 - 40961