Information Extraction from Spam Emails using Stylistic and Semantic Features to Identify Spammers

被引:0
|
作者
Halder, Soma [1 ]
Tiwari, Richa [1 ]
Sprague, Alan [1 ]
机构
[1] Univ Alabama Birmingham, Birmingham, AL 35229 USA
关键词
Spam; semantics; stylistics; natural language processing; IP address;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional anti spamming methods filter spam emails and prevent them from entering the inbox but take no measure to trace spammers and penalize them. We use natural language processing techniques to cluster spam emails from the same spammer based on the content and the style of the email. Spam emails from different sources are studied with features like stylistic, semantic and combination of both. Three sets of clustering are performed: clustering based on stylistic feature, clustering based on semantic feature and clustering based on combined feature. These clusters are then compared and evaluated. We notice that spam emails from the same sources have similarities and cluster together. These emails have URLs of the WebPages that the spammer is trying to promote. Clusters are mapped to the internet protocol (IP) of these URLs and the whois information of the IP addresses' help to get information about the source of spam.
引用
收藏
页码:104 / 107
页数:4
相关论文
共 50 条
  • [21] Information Extraction from Text Based on Semantic Inferentialism
    Pinheiro, Vladia
    Pequeno, Tarcisio
    Furtado, Vasco
    Nogueira, Douglas
    FLEXIBLE QUERY ANSWERING SYSTEMS: 8TH INTERNATIONAL CONFERENCE, FQAS 2009, 2009, 5822 : 333 - 344
  • [22] Learning Causal Semantic Representation from Information Extraction
    Zuo Xin
    Wang LiMin
    Zhou Shuang
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT UBIQUITOUS COMPUTING AND EDUCATION, 2009, : 404 - +
  • [23] Using functional style features to enhance information extraction from Greek texts
    Michos, SE
    Fakotakis, N
    Kokkinakis, G
    ADVANCES IN INTELLIGENT SYSTEMS: CONCEPTS, TOOLS AND APPLICATIONS, 1999, 21 : 143 - 154
  • [24] Automatic Extraction of Semantic Relations by Using Web Statistical Information
    Borzi, Valeria
    Faro, Simone
    Pavone, Arianna
    GRAPH-BASED REPRESENTATION AND REASONING, 2014, 8577 : 174 - 187
  • [25] Novel Machine Learning-Based Approach for Arabic Text Classification Using Stylistic and Semantic Features
    Fkih, Fethi
    Alsuhaibani, Mohammed
    Rhouma, Delel
    Qamar, Ali Mustafa
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 5871 - 5886
  • [26] Improvement of Visual Odometry Using Classic Features by Semantic Information
    Adachi, Miho
    Ishida, Hiroki
    Miyamoto, Ryusuke
    2022 8TH INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT'22), 2022, : 1512 - 1517
  • [27] Expression recognition using semantic information and local texture features
    Chenjian Wu
    Chengwei Huang
    Hong Chen
    Multimedia Tools and Applications, 2018, 77 : 11575 - 11588
  • [28] Expression recognition using semantic information and local texture features
    Wu, Chenjian
    Huang, Chengwei
    Chen, Hong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (09) : 11575 - 11588
  • [29] Improving Information Extraction from Images with Learned Semantic Models
    Baier, Stephan
    Ma, Yunpu
    Tresp, Volker
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5214 - 5218
  • [30] Automatic Extraction of References to Future Events from News Articles Using Semantic and Morphological Information
    Nakajima, Yoko
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 4385 - 4386