A single-link method algorithm for clustering large document collections

被引:0
|
作者
Kishida, K [1 ]
机构
[1] Surugadai Univ, Hanno, Saitama, Japan
来源
关键词
D O I
暂无
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
In the 1960s and 1970s, techniques for clustering a set of documents, in order to improve the effectiveness or efficiency of information retrieval systems, have been widely explored. Similar attempts have recently been made by many researchers to allow the visualisation of search results, to provide browsing based search modes or to enhance performance in searching very large collections. The purpose of this paper is to develop an algorithm for hierarchical clustering that can work for very large document collections. The algorithm is based on a combination of two ideas proposed by other researchers to save time and space in the process of hierarchical clustering; (1) the use of an inverted file for reducing the number of document pairs for which a similarity degree is calculated, and (2) a procedure for constructing a dendrogram based on single-link method from similarity data recorded on disk and not the main memory. In this paper, the algorithm is experimentally applied to a document set consisting of about 10,000 bibliographic records, and the processing time is analyzed empirically. In addition, the effects of removing words frequently appearing in documents are examined. As a result, we find that removing such words enable us to greatly reduce the processing time without significant change in the resulting set of clusters. Finally, an empirical comparison between the single-link method and the single-pass algorithm (leader-follower algorithm) is attempted.
引用
收藏
页码:27 / 38
页数:12
相关论文
共 50 条
  • [41] Collision control of a single-link flexible arm
    Sawada, Y
    [J]. 2004 IEEE CONFERENCE ON ROBOTICS, AUTOMATION AND MECHATRONICS, VOLS 1 AND 2, 2004, : 1032 - 1037
  • [42] Resonant Control of a Single-Link Flexible Manipulator
    Abdullahi, Auwalu M.
    Mohamed, Z.
    Nafea, Marwan M.
    [J]. JURNAL TEKNOLOGI, 2014, 67 (05):
  • [43] Fractional control of a single-link flexible manipulator
    Feliu, Vicente
    Vinagre, Bias M.
    Monje, Concepcion A.
    [J]. PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, VOL 6, PTS A-C, 2005, : 1563 - 1572
  • [44] Desktop single-link flexible boam testbed
    Lee, G.K.F.
    Ng, Y.G.
    Hoff, B.
    [J]. Proceedings of the ISMM International Symposium Computer Applications in Design, Simulation and Analysis, 1990,
  • [45] LEQG control of a single-link flexible arm
    [J]. Sawada, Y. (sawada@kit.ac.jp), Society of Instrument and Control Engineers, SICE (Society of Instrument and Control Engineers (SICE)):
  • [46] AN OPTIMAL PID TUNING METHOD FOR A SINGLE-LINK MANIPULATOR BASED ON THE CONTROL PARAMETRIZATION TECHNIQUE
    Li, Bin
    Guo, Xiaolong
    Zeng, Xiaodong
    Dian, Songyi
    Guo, Minhua
    [J]. DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS-SERIES S, 2020, 13 (06): : 1813 - 1823
  • [47] A new algorithm with segment protection and load balancing for single-link failure in multicasting survivable networks
    Wang, Xingwei
    Guo, Lei
    Wei, Xuetao
    Pang, Lan
    Wu, Tengfei
    Du, Juan
    Wang, Xuekui
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2009, 82 (03) : 377 - 381
  • [48] A new modeling approach to single-link flexible manipulator using singular perturbation method
    H. R. Karimi
    M. J. Yazdanpanah
    [J]. Electrical Engineering, 2006, 88 : 375 - 382
  • [49] A new modeling approach to single-link flexible manipulator using singular perturbation method
    Karimi, H. R.
    Yazdanpanah, M. J.
    [J]. ELECTRICAL ENGINEERING, 2006, 88 (05) : 375 - 382
  • [50] Optimisation of a Fuzzy Logic Controller for a Flexible Single-Link Robot Arm Using the Bees Algorithm
    Pham, D. T.
    Kalyoncu, Mete
    [J]. 2009 7TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, VOLS 1 AND 2, 2009, : 475 - 480