Information Extraction from Spam Emails using Stylistic and Semantic Features to Identify Spammers

被引:0
|
作者
Halder, Soma [1 ]
Tiwari, Richa [1 ]
Sprague, Alan [1 ]
机构
[1] Univ Alabama Birmingham, Birmingham, AL 35229 USA
关键词
Spam; semantics; stylistics; natural language processing; IP address;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional anti spamming methods filter spam emails and prevent them from entering the inbox but take no measure to trace spammers and penalize them. We use natural language processing techniques to cluster spam emails from the same spammer based on the content and the style of the email. Spam emails from different sources are studied with features like stylistic, semantic and combination of both. Three sets of clustering are performed: clustering based on stylistic feature, clustering based on semantic feature and clustering based on combined feature. These clusters are then compared and evaluated. We notice that spam emails from the same sources have similarities and cluster together. These emails have URLs of the WebPages that the spammer is trying to promote. Clusters are mapped to the internet protocol (IP) of these URLs and the whois information of the IP addresses' help to get information about the source of spam.
引用
收藏
页码:104 / 107
页数:4
相关论文
共 50 条
  • [31] Accurate Extraction of Artificial Pit-pond Integrating Edge Features and Semantic Information
    Yang X.
    Zhou Y.
    Zhang X.
    Li R.
    Yang D.
    Journal of Geo-Information Science, 2022, 24 (04) : 766 - 779
  • [32] Integrating semantic edges and segmentation information for building extraction from aerial images using UNet
    Abdollahi, Abolfazl
    Pradhan, Biswajeet
    MACHINE LEARNING WITH APPLICATIONS, 2021, 6
  • [33] Fine-grained flood disaster information extraction incorporating multiple semantic features
    Wang, Shunli
    Li, Rui
    Wu, Huayi
    Li, Jiang
    Shen, Yun
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2025, 18 (01)
  • [34] LN-Annote: An Alternative Approach to Information Extraction from Emails using Locally-Customized Named-Entity Recognition
    Jung, YoungHoon
    Stratos, Karl
    Carloni, Luca P.
    PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW 2015), 2015, : 538 - 548
  • [35] Open Information Extraction from Texts: Part II. Extraction of Semantic Relationships Using Unsupervised Machine Learning
    A. O. Shelmanov
    D. A. Devyatkin
    V. A. Isakov
    I. V. Smirnov
    Scientific and Technical Information Processing, 2020, 47 : 340 - 347
  • [36] Open Information Extraction from Texts: Part II. Extraction of Semantic Relationships Using Unsupervised Machine Learning
    Shelmanov, A. O.
    Devyatkin, D. A.
    Isakov, V. A.
    Smirnov, I., V
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2020, 47 (06) : 340 - 347
  • [37] Pattern mining for information extraction using lexical, syntactic and semantic information: Preliminary results
    Khoo, Christopher S. G.
    Na, Jin-Cheon
    Wang, Wei
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 676 - 681
  • [38] Using ILP to construct features for information extraction from semi-structured text
    Ramakrishnan, Ganesh
    Joshil, Sachindra
    Balakrishnan, Sreeram
    Srinivasan, Ashwin
    INDUCTIVE LOGIC PROGRAMMING, 2008, 4894 : 211 - 224
  • [39] Information extraction from scanned invoice images using text analysis and layout features
    Ha, H. T.
    Horak, A.
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 102
  • [40] Information extraction using semantic patterns for populating clinical data models
    Meng, F
    Chen, AA
    Son, RY
    Taira, RK
    Churchill, BM
    Kangarloo, H
    METMBS '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2004, : 10 - 16