Distributed Algorithm for Text Documents Clustering Based on k-Means Approach

被引:8
|
作者
Sarnovsky, Martin [1 ]
Carnoka, Noema [1 ]
机构
[1] Tech Univ Kosice, Fac Electrotech & Informat, Dept Cybernet & Artificial Intelligence, Letna 9-a, Kosice 04001, Slovakia
关键词
Clustering; Text mining; k-Means; Distributed computing; SIDED CONCEPT LATTICES;
D O I
10.1007/978-3-319-28561-0_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The presented paper describes the design and implementation of distributed k-means clustering algorithm for text documents analysis. Motivation for the research effort presented in this paper is to propose a distributed approach based on current in-memory distributed computing technologies. We have used our Jbowl java text mining library and GridGain as a framework for distributed computing. Using these technologies we have designed and implemented k-means distributed clustering algorithm in two modifications and performed the experiments on the standard text data collections. Experiments were conducted in two testing environments-a distributed computing infrastructure and on a multi-core server.
引用
收藏
页码:165 / 174
页数:10
相关论文
共 50 条
  • [1] Subspace clustering of text documents with feature weighting K-means algorithm
    Jing, LP
    Ng, MK
    Xu, J
    Huang, JZ
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 802 - 812
  • [2] Chinese text clustering algorithm based k-means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    [J]. 2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 301 - 307
  • [3] Chinese Text Clustering Algorithm Based K-Means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    [J]. 2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 1, 2011, : 90 - 93
  • [4] Weighted k-Means Algorithm Based Text Clustering
    Chen, Xiuguo
    Yin, Wensheng
    Tu, Pinghui
    Zhang, Hengxi
    [J]. IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 51 - +
  • [5] AN APPROACH FOR TEXT CLUSTERING USING MODIFIED K-MEANS ALGORITHM
    Rose, J. Dafni
    Mukherjee, Saswati
    [J]. 4TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGY AND ENGINEERING (ICSTE 2012), 2012, : 243 - 247
  • [6] A Semi-Supervised Text Clustering Approach Based on K-Means Algorithm
    Zhan, Lizhang
    Xu, Hong
    Chen, Xiuguo
    [J]. INTERNATIONAL CONFERENCE ON ENGINEERING AND BUSINESS MANAGEMENT (EBM2011), VOLS 1-6, 2011, : 2616 - 2620
  • [7] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
    Mao, Yingchi
    Xu, Ziyang
    Li, Xiaofang
    Ping, Ping
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 3149 - 3156
  • [8] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
    Mao, Yingchi
    Xu, Ziyang
    Ping, Ping
    Wang, Longbao
    [J]. 2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 386 - 391
  • [9] Design and application of a text clustering algorithm based on parallelized k-means clustering
    Wang H.
    Zhou C.
    Li L.
    [J]. Revue d'Intelligence Artificielle, 2019, 33 (06) : 453 - 460
  • [10] An improved K-Means text clustering algorithm based on Local Search
    Liu, Xiangwei
    [J]. 2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 11578 - 11581