Enhanced Topic-based Vector Space Model for semantics-aware spam filtering

被引:27
|
作者
Santos, Igor [1 ]
Laorden, Carlos [1 ]
Sanz, Borja [1 ]
Bringas, Pablo G. [1 ]
机构
[1] Univ Deusto, Lab Smartness Semant & Secur S3Lab, Bilbao 48007, Spain
关键词
Spam detection; Information Retrieval; Semantics; Computer security; Machine-learning; REGRESSION;
D O I
10.1016/j.eswa.2011.07.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. More than 85% of received e-mails are spam. Historical approaches to combat these messages including simple techniques such as sender blacklisting or the use of e-mail signatures, are no longer completely reliable. Currently, many solutions feature machine-learning algorithms trained using statistical representations of the terms that usually appear in the e-mails. Still, these methods are merely syntactic and are unable to account for the underlying semantics of terms within the messages. In this paper, we explore the use of semantics in spam filtering by representing e-mails with a recently introduced Information Retrieval model: the enhanced Topic-based Vector Space Model (eTVSM). This model is capable of representing linguistic phenomena using a semantic ontology. Based upon this representation, we apply several well-known machine-learning models and show that the proposed method can detect the internal semantics of spam messages. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:437 / 444
页数:8
相关论文
共 50 条
  • [1] A Topic-Based Hidden Markov Model for Real-Time Spam Tweets Filtering
    Washha, Mahdi
    Qaroush, Aziz
    Mezghani, Manel
    Sedes, Florence
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 833 - 843
  • [2] Text summarization using topic-based vector space model and semantic measure
    Belwal, Ramesh Chandra
    Rai, Sawan
    Gupta, Atul
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (03)
  • [3] Proximity semantics for topic-based abstract argumentation
    Budan, Maximiliano C. D.
    Laura Cobo, Maria
    Martinez, Diego C.
    Simari, Guillermo R.
    [J]. INFORMATION SCIENCES, 2020, 508 (135-153) : 135 - 153
  • [4] Semantics-Aware Hidden Markov Model for Human Mobility
    Shi, Hongzhi
    Li, Yong
    Cao, Hancheng
    Zhou, Xiangxin
    Zhang, Chao
    Kostakos, Vassilis
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (03) : 1183 - 1194
  • [5] Time-aware Topic-based Contextualization
    Nam Khanh Tran
    Nejdl, Wolfgang
    Niederee, Claudia
    [J]. WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 15 - 19
  • [6] A SEMANTICS-AWARE NORMALIZING FLOW MODEL FOR ANOMALY DETECTION
    Ma, Wei
    Lan, Shiyong
    Huang, Weikang
    Wang, Wenwu
    Yang, Hongyu
    Ma, Yitong
    Ma, Yongjie
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2207 - 2212
  • [7] An innovative spam filtering model based on support vector machine
    Islam, Md. Rafiqul
    Chowdhury, Morshed U.
    Zhou, Wanlei
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 2, PROCEEDINGS, 2006, : 348 - +
  • [8] Attentive Review Semantics-Aware Recommendation Model for Rating Prediction
    Kim, Jihyeon
    Li, Xinzhe
    Jin, Li
    Li, Qinglong
    Kim, Jaekyeong
    [J]. ELECTRONICS, 2024, 13 (14)
  • [9] LAIR: A Language for Automated Semantics-Aware Text Sanitization based on Frame Semantics
    Hedegaard, Steffen
    Houen, Soren
    Simonsen, Jakob Grue
    [J]. 2009 IEEE THIRD INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2009), 2009, : 47 - 52
  • [10] Mobile Spam Filtering base on BTM Topic Model
    Ma, Jialin
    Zhang, Yongjun
    Zhang, Lin
    [J]. ADVANCES ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING, 2017, 1 : 657 - 665