Frequent Term Based Text Document Clustering Using Similarity Measures: A Novel Approach

被引:0
|
作者
Gupta, Vijay Kumar [1 ]
Dutta, Maitreyee [2 ]
Kumar, Manoj [3 ]
机构
[1] Govt Girls Polytech, Dept IT, Charkhari, Mahoba, India
[2] NITTTR, Dept CS&E, Chandigarh, India
[3] BBDNITM, Dept IT, Lucknow, Uttar Pradesh, India
关键词
Clustering; Data Mining; Cosine Similarity; Similarity Index; Fuzzy Logic; Support Vector Machine; ALGORITHM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Clustering is one of the epic and traditional ways to make sure that the documents are retrieved at the right pace and according to the requirement. Clustering leads to keeping the similar kind of documents all together and so that they can be retrieved easily. The measure through which the relation between two documents is measured is called similarity index. There are several kind of similarity index already in the process. The proposed algorithm uses two kind of similarity index and combines them to produce a new similarity index. Similarity index plays a vital role in the clustering and classification procedure. The proposed algorithm also uses Fuzzy logic for the clustering rules and furthermore it is classified by the Support Vector Machine to justify the accuracy of the proposed solution.
引用
收藏
页码:164 / 169
页数:6
相关论文
共 50 条
  • [31] UTILIZING TERM PROXIMITY BASED FEATURES TO IMPROVE TEXT DOCUMENT CLUSTERING
    Paliwal, Shashank
    Pudi, Vikram
    KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 537 - 544
  • [32] On Term Similarity Measures for Short Text Classification
    Seki, Hirohisa
    Toriyama, Shuhei
    2019 IEEE 11TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (IWCIA 2019), 2019, : 53 - 58
  • [33] Hierarchical document clustering using frequent itemsets
    Fung, BCM
    Wang, K
    Ester, M
    PROCEEDINGS OF THE THIRD SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2003, : 59 - 70
  • [34] Text clustering based on asymmetric similarity
    School of Software, Tsinghua University, Beijing 100084, China
    Qinghua Daxue Xuebao, 2006, 7 (1325-1328):
  • [35] Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach
    Kutty, Sangeetha
    Tran, Tien
    Nayak, Richi
    Li, Yuefeng
    FOCUSED ACCESS TO XML DOCUMENTS, 2008, 4862 : 183 - 194
  • [36] A hybrid approach for text document clustering using Jaya optimization algorithm
    Thirumoorthy, Karpagalingam
    Muneeswaran, Karuppaiah
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178
  • [37] Improving Frequent-Term Based Text Clustering With Word Belief Network
    Zhang, Yong
    Liu, Ruifang
    Luo, Ruiyang
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY II, PTS 1-4, 2013, 411-414 : 207 - 214
  • [38] Text document clustering based on neighbors
    Luo, Congnan
    Li, Yanjun
    Chung, Soon M.
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (11) : 1271 - 1288
  • [39] Document clustering based on maximal frequent sequences
    Hernandez-Reyes, Edith
    Garcia-Hernandez, Rene A.
    Carrasco-Ochoa, J. A.
    Martinez-Trinidad, J. Fco.
    ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 257 - 267
  • [40] A maximal frequent itemset approach for web document clustering
    Zhuang, L
    Dai, HH
    FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2004, : 970 - 977