Semantic Oriented Document Clustering Using Distribution Semantics

被引:1
|
作者
Khan, Umar Ali [1 ]
Rafi, Muhammad [1 ]
机构
[1] Natl Univ & Emerging Sci, Karachi, Pakistan
关键词
Document clustering; distributional semantics; hierarchal agglomerative clustering (HAC);
D O I
10.1145/3206098.3206110
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The exponential growth of electronic form of textual documents in both public and proprietary storage force researchers to find way to efficiently extract meaningful, actionable information from these documents. Document clustering has find its niche in this area. This paper proposes a document representational model based on distributional semantics, the law of distributional semantics says that the linguist terms that appear with similar distribution in a language corpus generally have similar meaning. This representation of document model uses only those terms (linguistic feature) that have same distribution over a given collection of documents. So to find this, it is needed to find out the distributional terms by using distributional criteria and then representing the documents by only these distributional terms. A novel similarity measure is proposed over these documents that also utilized the very nature of distributional semantics in similarity calculation. Finally, hierarchal agglomerative clustering (HAC) is used to produce the final clusters. Standard text mining datasets are used to measure the effectiveness of this approach. The evaluation is based on purity of clusters and proposed approach achieved far better clustering results in comparison to conventional approach.
引用
收藏
页码:14 / 18
页数:5
相关论文
共 50 条
  • [1] A Survey of Document Clustering using Semantic Approach
    Saiyad, Nagma Y.
    Prajapati, Harshadkumar B.
    Dabhi, Vipul K.
    [J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 2555 - 2562
  • [2] Semantic Document Clustering Using a Similarity Graph
    Stanchev, Lubomir
    [J]. 2016 IEEE TENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2016, : 1 - 8
  • [3] Text document clustering using semantic neighbors
    Young Researchers Club, Jouybar Branch, Islamic Azad University, Jouybar, Iran
    [J]. J. Softw. Eng., 4 (136-144):
  • [4] Exploiting Document Level Semantics in Document Clustering
    Rafi, Muhammad
    Sharif, Muhammad Naveed
    Arshad, Waleed
    Rafay, Habibullah
    Mohsin, Sheharyar
    Shaikh, Mohammad Shahid
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (06) : 462 - 469
  • [5] Extracting document semantics for semantic header
    Wang, Tao
    Desai, Bipin C.
    [J]. 2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 2047 - +
  • [6] Web document clustering using semantic link analysis
    Arch-int, Somjit
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 2, PROCEEDINGS, 2006, : 13 - 18
  • [7] Statistical semantics for enhancing document clustering
    Farahat, Ahmed K.
    Kamel, Mohamed S.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 28 (02) : 365 - 393
  • [8] Statistical semantics for enhancing document clustering
    Ahmed K. Farahat
    Mohamed S. Kamel
    [J]. Knowledge and Information Systems, 2011, 28 : 365 - 393
  • [9] A Survey on Semantic Document Clustering
    Naik, Maitri P.
    Prajapati, Harshadkumar B.
    Dabhi, Vipul K.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [10] Using Latent Semantic Indexing to Improve the Accuracy of Document Clustering
    Zhan, Jiaming
    Loh, Han Tong
    [J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2007, 6 (03) : 181 - 188