The Document Similarity Index based on the Jaccard Distance for Mail Filtering

被引:9
|
作者
Temma, Seiya [1 ]
Sugii, Manabu [2 ]
Matsuno, Hiroshi [1 ]
机构
[1] Yamaguchi Univ, Grad Sch Sci & Technol Innovat, Yamaguchi, Japan
[2] Yamaguchi Univ, Fac Global & Sci Studies, Yamaguchi, Japan
关键词
mail-filtering; text mining; Jaccard index; co-occurrence words; attribute information;
D O I
10.1109/itc-cscc.2019.8793419
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a new index of similarity for classification of emails into ham and spam ones with the Jaccard index. It takes advantage of co-occurrence value of all pairs of two words in emails. The co-occurrence of words represents a sort of context in documents because a word is often in use with another word in the same context. Our proposed method classified emails into hams or spams with high accuracy rate than the present filtering system using appearance frequency of word. Our method could extract patterns of word usage reflecting the context of emails.
引用
收藏
页码:221 / 224
页数:4
相关论文
共 50 条
  • [1] Improving Jaccard Index for Measuring Similarity in Collaborative Filtering
    Lee, Soojung
    [J]. INFORMATION SCIENCE AND APPLICATIONS 2017, ICISA 2017, 2017, 424 : 799 - 806
  • [2] Collaborative filtering recommendation system based on improved Jaccard similarity
    Park S.H.
    Kim K.
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (08) : 11319 - 11336
  • [3] Collaborative Filtering Based on Gaussian Mixture Model and Improved Jaccard Similarity
    Yan, Hangyu
    Tang, Yan
    [J]. IEEE ACCESS, 2019, 7 (118690-118701) : 118690 - 118701
  • [4] DISTRIBUTIONAL PROPERTIES OF JACCARD INDEX OF SIMILARITY
    MCCORMICK, WP
    LYONS, NI
    HUTCHESON, K
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1992, 21 (01) : 51 - 68
  • [5] The probabilistic basis of Jaccard's index of similarity
    Real, R
    Vargas, JM
    [J]. SYSTEMATIC BIOLOGY, 1996, 45 (03) : 380 - 385
  • [6] Jaccard index-Based Assessing the Similarity of Research Fields in Dimensions
    Shtovba, Serhiy
    Petrychko, Mykola
    [J]. PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON DIGITAL CONTENT & SMART MULTIMEDIA (DCSMART 2019): VOL 1, 2019, 2533 : 117 - 128
  • [7] Jaccard distance as similarity measure for disparity map estimation
    Gonzalez-Huitron, Victor Alejandro
    Rodriguez-Mata, Abraham Efraim
    Amabilis-Sosa, Leonel Ernesto
    Baray-Arana, Rogelio
    Robledo-Vega, Isidro
    Valencia-Palomo, Guillermo
    [J]. IEEE LATIN AMERICA TRANSACTIONS, 2023, 21 (05) : 690 - 698
  • [8] A New Approach to Deriving Jaccard Similarity and Jaccard Distance Properties with and without Considering Feature Weights
    Kryszkiewicz, Marzena
    [J]. RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 2145 : 341 - 349
  • [9] On the Jaccard Index Similarity Measure in Ranking Fuzzy Numbers
    Ramli, Nazirah
    Mohamad, Daud
    [J]. MATEMATIKA, 2009, 25 (02) : 157 - 165
  • [10] Jaccard index based similarity measure to compare transcription factor binding site models
    Vorontsov, Ilya E.
    Kulakovskiy, Ivan V.
    Makeev, Vsevolod J.
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2013, 8