Scalable Spam Classifier for Web Tables

被引:0
|
作者
Villasenor, Santiago [1 ]
Nguyen, Tom [1 ]
Kola, Anusha [1 ]
Soderman, Sean [1 ]
Gubanov, Michael [1 ]
机构
[1] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
关键词
Web-search; Large-scale Data Management; Cloud Computing; Data Fusion and Cleaning; Summarization; Human-Computer Interaction;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Internet mail spam is a problem for most organizations and individuals. Spam can be classified into two categories: fraud and commercial. The fraud category includes phishing, scams, malware, counterfeit products and any other criminal activities. The commercial category includes promotional messages and newsletters that we do not want to receive, being sent illegally from legitimate organizations. Fraud can be seen as being a high threat with high volume while commercial spam is the opposite. Similar to mail, there are spam Web tables that do not have any useful content. Here we describe our machine-learning classifier for efficient and effective Web tables spam filtering that was tested on a large-scale Web tables corpus of approximate to 36 million tables.
引用
收藏
页码:4849 / 4851
页数:3
相关论文
共 50 条
  • [1] Antyscam - Practical Web Spam Classifier
    Luckner, Marcin
    Gad, Michal
    Sobkowiak, Pawel
    [J]. INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2019, 65 (04) : 713 - 722
  • [2] Web Spam Detection using SVM Classifier
    Patil, Rahul C.
    Patil, D. R.
    [J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
  • [3] Causal Cognition and Spam Classifier
    Taniguchi, Hidetaka
    Oyo, Kuratomo
    Kohno, Yu
    Takahashi, Tatsuji
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
  • [4] Towards Web Spam Filtering using a Classifier based on the Minimum Description Length Principle
    Silva, Renato M.
    Yamakami, Akebo
    Almeida, Tiago A.
    [J]. 2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 470 - 475
  • [5] A scalable hybrid approach for extracting head components from Web tables
    Jung, SW
    Kwon, HC
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) : 174 - 187
  • [6] A Scalable Spam Filtering Architecture
    Ferreira, Nuno
    Carvalho, Gracinda
    Pereira, Paulo Rogerio
    [J]. TECHNOLOGICAL INNOVATION FOR THE INTERNET OF THINGS, 2013, 394 : 107 - 114
  • [7] An interoperable and scalable Web-based system for classifier sharing and fusion
    Tsoumakas, Grigorios
    Vlahavas, Ioannis
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (03) : 716 - 724
  • [8] A scalable spam filtering architecture
    Ferreira, Nuno
    Carvalho, Gracinda
    Pereira, Paulo Rogério
    [J]. IFIP Advances in Information and Communication Technology, 2013, 394 : 107 - 114
  • [9] Exploiting the Spam Correlations in Scalable Online Social Spam Detection
    Xu, Hailu
    Hu, Liting
    Liu, Pinchao
    Guan, Boyuan
    [J]. CLOUD COMPUTING - CLOUD 2019, 2019, 11513 : 146 - 160
  • [10] Harnessing the Nature of Spam in Scalable Online Social Spam Detection
    Xu, Hailu
    Guan, Boyuan
    Liu, Pinchao
    Escudero, William
    Hu, Liting
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3733 - 3736