Spam Filtering by Semantics-based Text Classification

被引:0
|
作者
Hu, Wei [1 ]
Du, Jinglong [1 ]
Xing, Yongkang [1 ]
机构
[1] Chongqing Univ, Coll Comp Sci, Chongqing, Peoples R China
关键词
spam filter; text classification; semantics extraction; feature selection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spam has been a serious and annoying problem for decades. Even though plenty of solutions have been put forward, there still remains a lot to be promoted in filtering spam emails more efficiently. Nowadays a major problem in spam filtering as well as text classification in natural language processing is the huge size of vector space due to the numerous feature terms, which is usually the cause of extensive calculation and slow classification. Extracting semantic meanings from the content of texts and using these as feature terms to build up the vector space, instead of using words as feature terms in tradition ways, could reduce the dimension of vectors effectively and promote the classification at the same time. In this paper, a novel Chinese spam filtering approach with semantics-based text classification technology was proposed and the related feature terms were selected from the semantic meanings of the text content. Both the extraction of semantic meanings and the selection of feature terms are implemented through attaching annotations on the texts layer-by-layer. This filter performed well when experimented on a public Chinese spam corpus.
引用
收藏
页码:89 / 94
页数:6
相关论文
共 50 条
  • [1] Semantics-Based Representation Model for Multi-layer Text Classification
    Yun, Jiali
    Jing, Liping
    Yu, Jian
    Huang, Houkuan
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II, 2010, 6277 : 1 - 10
  • [2] SMS Spam Filtering based on Text Classification and Expert System
    Bozan, Yavuz Selim
    Coban, Onder
    Ozyer, Gulsah Tumuklu
    Ozyer, Baris
    [J]. 2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 2345 - 2348
  • [3] Index-based Online Text Classification for SMS Spam Filtering
    Liu, Wuying
    Wang, Ting
    [J]. JOURNAL OF COMPUTERS, 2010, 5 (06) : 844 - 851
  • [4] Semantic-Summarizer: Semantics-based text summarizer for English language text
    Mohd, Mudasir
    Nowsheena
    Wani, Mohsin Altaf
    Khanday, Hilal Ahmad
    Mir, Umar Bashir
    Nasrullah, Sheikh
    Maqbool, Zahid
    Wani, Abid Hussain
    [J]. SOFTWARE IMPACTS, 2023, 18
  • [5] Personalized manufacturing service recommendation using semantics-based collaborative filtering
    Zhang, Wenyu
    Guo, Shanshan
    Zhang, Shuai
    [J]. CONCURRENT ENGINEERING-RESEARCH AND APPLICATIONS, 2015, 23 (02): : 166 - 179
  • [6] Towards Distributional Semantics-Based Classification of Collocations for Collocation Dictionaries
    Wanner, Leo
    Ferraro, Gabriela
    Moreno, Pol
    [J]. INTERNATIONAL JOURNAL OF LEXICOGRAPHY, 2017, 30 (02) : 167 - 186
  • [7] Semantics-based event-driven web news classification
    Hu, Wei
    Sheng, Huan-Ye
    [J]. FRONTIERS OF HIGH PERFORMANCE COMPUTING AND NETWORKING - ISPA 2007 WORKSHOPS, 2007, 4743 : 136 - +
  • [8] Human Behavior Recognition: Semantics-based Text Copy Detection Method
    Yang, Liu
    Xi, Jie
    [J]. 2015 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE THEORY, SYSTEMS AND APPLICATIONS (CCITSA 2015), 2015, : 158 - 162
  • [9] Short Text Classification Based on Semantics
    Ma, Chenglong
    Wan, Xin
    Zhang, Zhen
    Li, Taisong
    Zhang, Yan
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III, 2015, 9227 : 463 - 470
  • [10] Collective classification for spam filtering
    Laorden, Carlos
    Sanz, Borja
    Santos, Igor
    Galan-Garcia, Patxi
    Bringas, Pablo G.
    [J]. LOGIC JOURNAL OF THE IGPL, 2013, 21 (04) : 540 - 548