Spam filtering based on online ranking logistic regression

被引:0
|
作者
机构
[1] Sun, Guanglu
[2] Qi, Haoliang
来源
Sun, G. (guanglu_sun@163.com) | 1600年 / Tsinghua University卷 / 53期
关键词
Binary classification - Classification models - Discriminative models - Logistic Regression modeling - Machine learning methods - On-line rankings - Spam - Statistical significance;
D O I
暂无
中图分类号
学科分类号
摘要
Spam filtering is an important issue in Web information processing. Many machine learning methods are utilized to filter spam. Current researches transform the filtering problem into binary classification, in which the optimization target of the classification model is not consistent with 1-AUC, the usual evaluation measurement. The inconsistence results in the deviation of model optimization, which makes a bad influence on filtering results. In this study, spam filtering was transformed into the ranking model through the optimization oriented to 1-AUC with online ranking logistic regression model then proposed to tackle the deviation of the model's score in the online learning module. TONE (train on or near error), re-sampling and weights update methods were used to promote the learning speed in online adjustment of model's parameters. Experiments on open evaluation datasets show that the developed method is better than the traditional online logistic regression model with statistical significance.
引用
收藏
相关论文
共 50 条
  • [21] Spam filtering based on classifiers ensemble
    Yang, Zhen
    Fan, Ke-Feng
    Lei, Jian-Jun
    Lai, Ying-Xu
    Tongxin Xuebao/Journal on Communication, 2008, 29 (SUPPL.): : 7 - 11
  • [22] Hybrid Decision Tree and Logistic Regression Classifier for Email Spam Detection
    Wijaya, Adi
    Bisri, Achmad
    PROCEEDINGS OF 2016 8TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING (ICITEE), 2016,
  • [23] ANOMALY-BASED SPAM FILTERING
    Santos, Igor
    Laorden, Carlos
    Ugarte-Pedrero, Xabier
    Sanz, Borja
    Bringas, Pablo G.
    SECRYPT 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, 2011, : 5 - 14
  • [24] Content-Based Spam Filtering
    Almeida, Tiago A.
    Yamakami, Akebo
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [25] Analysis of Spam Detection Using Integration of Logistic Regression and PSO Algorithm
    Ponmalar, A.
    Rajkumar, K.
    Hariharan, U.
    Kalaiselvi, V.K.G.
    Deeba, S.
    Proceedings of the 2021 4th International Conference on Computing and Communications Technologies, ICCCT 2021, 2021, : 396 - 402
  • [26] Efficient and effective spam filtering and re-ranking for large web datasets
    Gordon V. Cormack
    Mark D. Smucker
    Charles L. A. Clarke
    Information Retrieval, 2011, 14 : 441 - 465
  • [27] Efficient and effective spam filtering and re-ranking for large web datasets
    Cormack, Gordon V.
    Smucker, Mark D.
    Clarke, Charles L. A.
    INFORMATION RETRIEVAL, 2011, 14 (05): : 441 - 465
  • [28] Eigenvector Spatial Filtering-Based Logistic Regression for Landslide Susceptibility Assessment
    Li, Huifang
    Chen, Yumin
    Deng, Susu
    Chen, Meijie
    Fang, Tao
    Tan, Huangyuan
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (08)
  • [29] Suspicious URL Filtering based on Logistic Regression with Multi-view Analysis
    Su, Ke-Wei
    Wu, Kuo-Ping
    Lee, Hahn-Ming
    Wei, Te-En
    2013 EIGHTH ASIA JOINT CONFERENCE ON INFORMATION SECURITY (ASIAJCIS), 2013, : 77 - 84
  • [30] Detection model of effectiveness of Chinese online reviews based on logistic regression
    Wu, Hanqian
    Zhu, Yunjie
    Xie, Jue
    Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2015, 45 (03): : 433 - 437