Scalable Spam Classifier for Web Tables

被引:0
|
作者
Villasenor, Santiago [1 ]
Nguyen, Tom [1 ]
Kola, Anusha [1 ]
Soderman, Sean [1 ]
Gubanov, Michael [1 ]
机构
[1] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
关键词
Web-search; Large-scale Data Management; Cloud Computing; Data Fusion and Cleaning; Summarization; Human-Computer Interaction;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Internet mail spam is a problem for most organizations and individuals. Spam can be classified into two categories: fraud and commercial. The fraud category includes phishing, scams, malware, counterfeit products and any other criminal activities. The commercial category includes promotional messages and newsletters that we do not want to receive, being sent illegally from legitimate organizations. Fraud can be seen as being a high threat with high volume while commercial spam is the opposite. Similar to mail, there are spam Web tables that do not have any useful content. Here we describe our machine-learning classifier for efficient and effective Web tables spam filtering that was tested on a large-scale Web tables corpus of approximate to 36 million tables.
引用
收藏
页码:4849 / 4851
页数:3
相关论文
共 50 条
  • [41] Advances in spam detection for email spam, web spam, social network spam, and review spam: ML-based and nature-inspired-based techniques
    Akinyelu, Andronicus A.
    [J]. JOURNAL OF COMPUTER SECURITY, 2021, 29 (05) : 473 - 529
  • [42] APPLICATION OF ADAPTIVE SPLITTING AND SELECTION CLASSIFIER TO THE SPAM FILTERING PROBLEM
    Jackowski, Konrad
    Krawczyk, Bartosz
    Wozniak, Michal
    [J]. CYBERNETICS AND SYSTEMS, 2013, 44 (6-7) : 569 - 588
  • [43] Naïve Bayes Classifier Model for Detecting Spam Mails
    Kumar S.
    Gupta K.
    Gupta M.
    [J]. Annals of Data Science, 2024, 11 (06) : 1887 - 1897
  • [44] Personalized spam filtering with semi-supervised classifier ensemble
    Cheng, Victor
    Li, C. H.
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 195 - +
  • [45] Russian Web Tables: A Public Corpus of Web Tables for Russian Language Based on Wikipedia
    Fedorov P.E.
    Mironov A.V.
    Chernishev G.A.
    [J]. Lobachevskii Journal of Mathematics, 2023, 44 (1) : 111 - 122
  • [46] Research on spam classifier based on features of spammer's behaviours
    Westone United Lab., College of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054 SiChuan, China
    [J]. Information Technology Journal, 2008, 7 (01) : 165 - 169
  • [47] Spam Filtering System Based on Rough Set and Bayesian Classifier
    Wang, Yun
    Wu, Zhiqiang
    Wu, Runxiu
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 624 - +
  • [48] Spam Filtering using Association Rules and Naive Bayes Classifier
    Yang, Tianda
    Qian, Kai
    Lo, Dan Chia-Tien
    Al Nasr, Kamal
    Qian, Ying
    [J]. PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATCS AND COMPUTING (IEEE PIC), 2015, : 638 - 642
  • [49] Content Based Spam Detection in Email using Bayesian Classifier
    Rathod, Sunil B.
    Pattewar, Tareek M.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2015, : 1257 - 1261
  • [50] Scalable parameterized quantum circuits classifier
    Ding, Xiaodong
    Song, Zhihui
    Xu, Jinchen
    Hou, Yifan
    Yang, Tian
    Shan, Zheng
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):