Information Extraction from Spam Emails using Stylistic and Semantic Features to Identify Spammers

被引:0
|
作者
Halder, Soma [1 ]
Tiwari, Richa [1 ]
Sprague, Alan [1 ]
机构
[1] Univ Alabama Birmingham, Birmingham, AL 35229 USA
关键词
Spam; semantics; stylistics; natural language processing; IP address;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional anti spamming methods filter spam emails and prevent them from entering the inbox but take no measure to trace spammers and penalize them. We use natural language processing techniques to cluster spam emails from the same spammer based on the content and the style of the email. Spam emails from different sources are studied with features like stylistic, semantic and combination of both. Three sets of clustering are performed: clustering based on stylistic feature, clustering based on semantic feature and clustering based on combined feature. These clusters are then compared and evaluated. We notice that spam emails from the same sources have similarities and cluster together. These emails have URLs of the WebPages that the spammer is trying to promote. Clusters are mapped to the internet protocol (IP) of these URLs and the whois information of the IP addresses' help to get information about the source of spam.
引用
收藏
页码:104 / 107
页数:4
相关论文
共 50 条
  • [41] Improving information retrieval using document clusters and semantic synonym extraction
    Bharathi, G.
    Venkatesan, D.
    Journal of Theoretical and Applied Information Technology, 2012, 36 (02): : 167 - 173
  • [42] Using pointwise mutual information to identify implicit features in customer reviews
    Su, Qi
    Xiang, Kun
    Wang, Houfeng
    Sun, Bin
    Yu, Shiwen
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 22 - +
  • [43] Text Clustering using Semantic Features for Utilizing NFC Access Information
    Park, Sun
    Kim, DaeKyu
    Cha, ByungRae
    INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2013, 7 (03): : 395 - 403
  • [44] Mining information from sentences through Semantic Web data and Information Extraction tasks
    Martinez-Rodriguez, Jose L.
    Lopez-Arevalo, Ivan
    Rios-Alvarado, Ana B.
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (01) : 3 - 20
  • [45] Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques
    Qiu, Qinjun
    Xie, Zhong
    Wu, Liang
    Tao, Liufeng
    EARTH SCIENCE INFORMATICS, 2020, 13 (04) : 1393 - 1410
  • [46] Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques
    Qinjun Qiu
    Zhong Xie
    Liang Wu
    Liufeng Tao
    Earth Science Informatics, 2020, 13 : 1393 - 1410
  • [47] AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation information
    Mingjing Tang
    Tong Li
    Wei Gao
    Yu Xia
    Complex & Intelligent Systems, 2023, 9 : 25 - 39
  • [48] Incorporating Bidirection-Interactive Information and Semantic Features for Relational Facts Extraction (Student Abstract)
    Yu, Yang
    Wang, Guohua
    Ren, Haopeng
    Cai, Yi
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15947 - 15948
  • [49] Medicinal Plant Leaf Information Extraction Using Deep Features
    Prasad, Shitala
    Singh, Pankaj P.
    TENCON 2017 - 2017 IEEE REGION 10 CONFERENCE, 2017, : 2722 - 2726
  • [50] AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation information
    Tang, Mingjing
    Li, Tong
    Gao, Wei
    Xia, Yu
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (01) : 25 - 39