A NEW APPROACH FOR DOCUMENT CLUSTERING USING MAPREDUCE (VAR-SECTING CLUSTERING)

被引:0
|
作者
Elsayed, Abdelrahman [1 ]
Ismail, Osama [2 ]
Mokhtar, Hoda M. O. [2 ]
机构
[1] Agr Res Ctr, Cent Lab Agr Expert Syst, Giza, Egypt
[2] Cairo Univ, Fac Comp & Informat, Giza, Egypt
关键词
Clustering; MapReduce; K-means algorithm; Distributed computing; WORDNET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document clustering is the process of grouping related documents with each other. It facilitates organizing search results and document management. K-means algorithm and its variant bisecting k-means have been applied for document clustering and approved good clustering results. The increased number of available documents requires utilizing of distributed computing and huge number of computer resources which are available through cloud computing. This paper introduces Var-secting k-means algorithm. In addition to generating binary tree as in Bisecting k-means algorithm, it can generate hierarchy tree with variable number of nodes per tree level. The experimental results show that Var-secting k-means algorithm utilizes distributed computing nodes better than Bisecting k-means, especially when using MapReduce programming model.
引用
收藏
页码:57 / 64
页数:8
相关论文
共 50 条
  • [1] The BigKClustering Approach for Document Clustering using Hadoop MapReduce
    Megarchioti, Sofia
    Mamalis, Basilis
    [J]. 22ND PAN-HELLENIC CONFERENCE ON INFORMATICS (PCI 2018), 2018, : 261 - 266
  • [2] A New Hybrid Approach for Document Clustering
    Ismael, Osama
    [J]. 2017 13TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2017, : 291 - 296
  • [3] CUES: A New Hierarchical Approach for Document Clustering
    Basu, Tanmay
    Murthy, C. A.
    [J]. JOURNAL OF PATTERN RECOGNITION RESEARCH, 2013, 8 (01): : 66 - 84
  • [4] Design and Implement of Distributed Document Clustering Based on MapReduce
    Wan, Jian
    Yu, Wenming
    Xu, Xianghua
    [J]. PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2009), 2009, : 278 - 280
  • [5] A Survey of Document Clustering using Semantic Approach
    Saiyad, Nagma Y.
    Prajapati, Harshadkumar B.
    Dabhi, Vipul K.
    [J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 2555 - 2562
  • [6] Document classification: An approach using feature clustering
    Harish, B.S.
    Udayasri, B.
    [J]. Advances in Intelligent Systems and Computing, 2014, 235 : 163 - 173
  • [7] DOCUMENT CLUSTERING USING AN INVERTED FILE APPROACH
    WILLETT, P
    [J]. JOURNAL OF INFORMATION SCIENCE, 1980, 2 (05) : 223 - 231
  • [8] Partition Document Clustering using Ontology Approach
    Punitha, S. C.
    Jayasree, R.
    Punithavalli, M.
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS, 2013,
  • [9] K-mer clustering algorithm using a MapReduce approach
    Kim, Chang Sik
    Winn, Martyn D.
    Sachdeva, Vipin
    Jordan, Kirk E.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [10] Web Document Clustering Approach using WordNet Lexical Categories and Fuzzy Clustering
    Gharib, Tarek F.
    Fouad, Mohammed M.
    Aref, Mostafa M.
    [J]. 2008 11TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY: ICCIT 2008, VOLS 1 AND 2, 2008, : 55 - +