Learning to semantically classify email messages

被引:0
|
作者
Jiang, Eric [1 ]
机构
[1] Univ San Diego, San Diego, CA 92110 USA
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
AS a semantic vector space model for information retrieval (IR), Latent Semantic Indexing (LSI) employs singular value decomposition (SVD) to transform individual documents into the statistically derived semantic vectors. In this paper a new junk email (spam) filtering model, 2LSI-SF, is proposed and it is based on the augmented category LSI spaces and classifies email messages by their content. The model utilizes the valuable discriminative information in the training data and incorporates several pertinent feature selection and message classification algorithms. The experiments of 2LSI-SF on a benchmark spam testing corpus (PU1) and a newly compiled Chinese spam corpus (ZH1) have been conducted. The results from the experiments and performance comparison with the popular Support Vector Machines (SVM) and naive Bayes classifiers have shown that 2LSI-SF is capable of filtering spam effectively.
引用
收藏
页码:700 / 711
页数:12
相关论文
共 50 条
  • [1] Learning to classify email: A survey
    Wang, XL
    Cloete, I
    [J]. PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 5716 - 5719
  • [2] Fighting Cyber Crime in Email Spamming: An Evaluation of Fuzzy Clustering Approach to Classify Spam Messages
    Wijayanto, Arie Wahyu
    Takdir
    [J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY SYSTEMS AND INNOVATION (ICITSI), 2014, : 19 - 24
  • [3] Improving email conversation efficiency through semantically enhanced email
    Scerri, Simon
    Davis, Brian
    Handschuh, Siegfried
    [J]. DEXA 2007: 18TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, : 490 - +
  • [4] Using email messages to improve learning in university distance education
    Vallejo, Miguel A.
    Vallejo-Slocker, Laura
    [J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2023, 58 : 134 - 134
  • [5] Nigerian English in informal email messages
    Chiluwa, Innocent
    [J]. ENGLISH WORLD-WIDE, 2010, 31 (01) : 40 - 61
  • [6] Email Messages from Chris Marker
    Crowdus, Gary
    [J]. CINEASTE, 2018, 43 (03): : 9 - 9
  • [7] IP geolocation suspicious email messages
    Butkovic, Asmir
    Mrdovic, Sasa
    Mujacic, Samra
    [J]. 2013 21ST TELECOMMUNICATIONS FORUM (TELFOR), 2013, : 881 - +
  • [8] Email messages: Towards a pedagogy of caring
    Aungamuthu, Yougan
    [J]. INDEPENDENT JOURNAL OF TEACHING AND LEARNING, 2011, 6 : 34 - 44
  • [9] The Lifetime of Email Messages: A Large-Scale Analysis of Email Revisitation
    Alrashed, Tarfah
    Awadallah, Ahmed Hassan
    Dumais, Susan
    [J]. CHIIR'18: PROCEEDINGS OF THE 2018 CONFERENCE ON HUMAN INFORMATION INTERACTION & RETRIEVAL, 2018, : 120 - 129
  • [10] Using Distributional Analysis to Semantically Classify UMLS Concepts
    Fan, Jung-Wei
    Xu, Hua
    Friedman, Carol
    [J]. MEDINFO 2007: PROCEEDINGS OF THE 12TH WORLD CONGRESS ON HEALTH (MEDICAL) INFORMATICS, PTS 1 AND 2: BUILDING SUSTAINABLE HEALTH SYSTEMS, 2007, 129 : 519 - +