Adding semantics to email clustering

被引:0
|
作者
Li, Hua [1 ]
Shen, Dou
Zhang, Benyu
Chen, Zheng
Yang, Qiang
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel algorithm to cluster emails according to their contents and the sentence styles of their subject lines. In our algorithm, natural language processing techniques and frequent itemset mining techniques are utilized to automatically generate meaningful generalized sentence patterns (GSPs) from subjects of emails. Then we put forward a novel unsupervised approach which treats GSPs as pseudo class labels and conduct email clustering in a supervised manner, although no human labeling is involved. Our proposed algorithm is not only expected to improve the clustering performance, it can also provide meaningful descriptions of the resulted clusters by the GSPs. Experimental results on open dataset (Enron email dataset) and a personal email dataset collected by ourselves demonstrate that the proposed algorithm outperforms the K-means algorithm in terms of the popular measurement Fl. Furthermore, the cluster naming readability is improved by square 8.5% on the personal email dataset.
引用
收藏
页码:938 / 942
页数:5
相关论文
共 50 条
  • [1] A Novel Approach for Email Clustering Based on Semantics
    He, Bin
    Li, Zefeng
    Yang, Nan
    [J]. 2014 11TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2014, : 269 - 272
  • [2] Email Clustering & Generating Email Templates Based on Their Topics
    Coskun, Fatih
    Gezer, Cengiz
    Gungor, V. Cagri
    [J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND DATA MINING (ICISDM 2021), 2021, : 96 - 103
  • [3] Clustering and classification of email contents
    Alsmadi, Izzat
    Alhami, Ikdam
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2015, 27 (01) : 46 - 57
  • [4] Adding semantics to business intelligence
    Sell, D
    Cabral, L
    Motta, E
    Domingue, J
    Pacheco, R
    [J]. Sixteenth International Workshop on Database and Expert Systems Applications, Proceedings, 2005, : 543 - 547
  • [5] Adding behavioral semantics to models
    Rivera, Jose E.
    Vallecillo, Antonio
    [J]. 11TH IEEE INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE, PROCEEDINGS, 2007, : 169 - 180
  • [6] Adding semantics to internet of things
    Su, Xiang
    Riekki, Jukka
    Nurminen, Jukka K.
    Nieminen, Johanna
    Koskimies, Markus
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (08): : 1844 - 1860
  • [7] Evolving Email Clustering Method for Email Grouping: A Machine Learning Approach
    Ayodele, Taiwo
    Zhou, Shikun
    Khusainov, Rinat
    [J]. 2009 SECOND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009), 2009, : 357 - 362
  • [8] CLUSTERING OF OPTIMIZED DATA FOR EMAIL FORENSICS
    Salhi, Dhai Eddine
    Tari, Abdelkamel
    Kechadi, M-Tahar
    [J]. RAIRO-OPERATIONS RESEARCH, 2016, 50 (4-5) : 951 - 963
  • [9] Clustering Analysis of Email Malware Campaigns
    Zhang, Ruichao
    Wang, Shang
    Burton, Renee
    Minh Hoang
    Hu, Juhua
    Nascimento, Anderson C. A.
    [J]. PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (IEEE CSR), 2021, : 95 - 102
  • [10] Adding Semantics to Online Learning Environments
    Ermalai, I.
    Mocofan, M.
    Onita, M.
    Vasiu, R.
    [J]. SACI: 2009 5TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS, 2009, : 559 - 563