Hierarchical Sequence Clustering Algorithm for Data Mining

被引:0
|
作者
Chezhian, V. Umadevi [1 ]
Subash, Thanappan [2 ]
Samy, M. Ragavan [3 ]
机构
[1] Coll Business & Econ, Dept Comp Sci, Asmera, Eritrea
[2] Eritrea Inst Technol, Dept Civil Engn, Abardae, Eritrea
[3] Singapore Refinery Co Pvt Ltd, Singapore, Singapore
关键词
Data Mining; Hierarchical Clustering; Sequence Clustering; Probabilistic Suffix Tree; UPMGA; SUFFIX TREE CONSTRUCTION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Bioinformatics emerged as a challenging new area of research and brought forth numerous computational problems. Here computers are used to gather, store, analyze and merge biological data. In this paper, the problem of clustering interval-scaled data and sequence data is analyzed in a new approach using Hierarchical Sequence Clustering. In Sequence clustering, it is necessary to find the similarity or distance between each pair of sequences. To find the similarity between sequences the data structure Probabilistic Suffix Tree can be used. An agglomerative algorithm is introduced based on UPGMA (Un weighted Pair wise Group Average Method) cluster analysis, that required O(n(3)) of total computing time: Then a new algorithm using the new approach is introduced with O(n(2)) computing time. The result of this new algorithm is compared with UPGMA cluster analysis.
引用
收藏
页码:1861 / 1864
页数:4
相关论文
共 50 条
  • [1] A hierarchical clustering algorithm for categorical sequence data
    Oh, SJ
    Kim, JY
    [J]. INFORMATION PROCESSING LETTERS, 2004, 91 (03) : 135 - 140
  • [2] Hierarchical clustering for data mining
    Szymkowiak, A
    Larsen, J
    Hansen, LK
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, PTS 1 AND 2, 2001, 69 : 261 - 265
  • [3] A sequence-element-based hierarchical clustering algorithm for categorical sequence data
    Oh, SJ
    Kim, JY
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2005, 4 (01) : 81 - 96
  • [4] Hierarchical clustering for data mining by rbf network
    Ciftcioglu, Ö
    Sariyildiz, S
    [J]. DATA MINING II, 2000, 2 : 477 - 486
  • [5] Research on the high robustness data classification and the mining algorithm based on hierarchical clustering and KNN
    Li, Haohang
    Wang, Shen
    Tang, Rui
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 1049 - 1054
  • [6] Performance Evaluation of Enhanced Hierarchical and Partitioning Based Clustering Algorithm (EPBCA) in Data Mining
    Singh, Gurpreet
    Kaur, Jaskaranjit
    Mulge, Yusuf
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2015, : 805 - 810
  • [7] A modified clustering algorithm for data mining
    Xu, ZJ
    Wang, LS
    Luo, JC
    Zhang, JQ
    [J]. IGARSS 2005: IEEE International Geoscience and Remote Sensing Symposium, Vols 1-8, Proceedings, 2005, : 741 - 744
  • [8] An Effective Clustering Algorithm for Data Mining
    Vijendra, Singh
    Ashwini, Kelkar
    Laxman, Sahoo
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA STORAGE AND DATA ENGINEERING (DSDE 2010), 2010, : 250 - 253
  • [9] A fast hierarchical clustering algorithm for large-scale protein sequence data sets
    Szilagyi, Sandor M.
    Szilagyi, Laszlo
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 48 : 94 - 101
  • [10] Assessment of hierarchical clustering methodologies for proteomic data mining
    Meunier, Bruno
    Dumas, Emilie
    Piec, Isabelle
    Bechet, Daniel
    Hebraud, Michel
    Hocquette, Jean-Francois
    [J]. JOURNAL OF PROTEOME RESEARCH, 2007, 6 (01) : 358 - 366