Distributed Document Clustering Analysis Based on a Hybrid Method

被引:0
|
作者
J.E.Judith [1 ]
J.Jayakumari [1 ]
机构
[1] Noorul Islam Centre for Higher Education
关键词
distributed document clustering; hadoop; k-means; PSO; mapreduce;
D O I
暂无
中图分类号
TP18 [人工智能理论]; TP391.1 [文字信息处理];
学科分类号
081104 ; 0812 ; 081203 ; 0835 ; 1405 ;
摘要
Clustering is one of the recently challenging tasks since there is an ever.growing amount of data in scientific research and commercial applications. High quality and fast document clustering algorithms are in great demand to deal with large volume of data. The computational requirements for bringing such growing amount data to a central site for clustering are complex. The proposed algorithm uses optimal centroids for K.Means clustering based on Particle Swarm Optimization(PSO).PSO is used to take advantage of its global search ability to provide optimal centroids which aids in generating more compact clusters with improved accuracy. This proposed methodology utilizes Hadoop and Map Reduce framework which provides distributed storage and analysis to support data intensive distributed applications. Experiments were performed on Reuter’s and RCV1 document dataset which shows an improvement in accuracy with reduced execution time.
引用
收藏
页码:131 / 142
页数:12
相关论文
共 50 条
  • [1] Distributed Document Clustering Analysis Based on a Hybrid Method
    Judith, J. E.
    Jayakumari, J.
    [J]. CHINA COMMUNICATIONS, 2017, 14 (02) : 131 - 142
  • [2] Scalability Analysis of Semantics based Distributed Document Clustering Algorithms
    Shah, Neepa
    Mahajan, Sunita
    [J]. 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 763 - 768
  • [3] XML Document Clustering Based on Spectral Analysis Method
    Li Xinye
    [J]. ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 304 - 307
  • [4] An Efficient Hybrid Hierarchical Document Clustering Method
    Zhu, Yehang
    Fung, Benjamin C. M.
    Mu, Dejun
    Li, Yanling
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 395 - +
  • [5] Design and Implement of Distributed Document Clustering Based on MapReduce
    Wan, Jian
    Yu, Wenming
    Xu, Xianghua
    [J]. PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2009), 2009, : 278 - 280
  • [6] Distributed hierarchical document clustering
    Deb, Debzani
    Fuad, M. Muztaba
    Angryk, Rafal A.
    [J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER SCIENCE AND TECHNOLOGY, 2006, : 328 - +
  • [7] A Hybrid Method for Manufacturing Text Mining Based on Document Clustering and Topic Modeling Techniques
    Shotorbani, Peyman Yazdizadeh
    Ameri, Farhad
    Kulvatunyou, Boonserm
    Ivezic, Nenad
    [J]. ADVANCES IN PRODUCTION MANAGEMENT SYSTEMS: INITIATIVES FOR A SUSTAINABLE WORLD, 2016, 488 : 777 - 786
  • [8] An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm
    Sardar T.H.
    Ansari Z.
    [J]. Ansari, Zahid (zahid_cs@pace.edu.in), 1600, Springer (101): : 641 - 650
  • [9] A Document Clustering Method based on Hierarchical Algorithm with Model Clustering
    Sun, Haojun
    Liu, Zhihui
    Kong, Lingjun
    [J]. 2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3, 2008, : 1229 - +
  • [10] Ontology Based Document Clustering - An Efficient Hybrid Approach
    Jasila, E. K.
    Saleena, N.
    Nazeer, Abdul K. A.
    [J]. PROCEEDINGS OF THE 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC 2019), 2019, : 153 - 157