Generalized k-medians clustering for strings

被引:0
|
作者
Martínez-Hinarejos, CD [1 ]
Juan, A [1 ]
Casacuberta, F [1 ]
机构
[1] Univ Politecn Valencia, Inst Tecnol Informat, Dept Sistemes Informat & Computacio, Valencia 46022, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering methods are used in pattern recognition to obtain natural groups from a data set in the framework Of unsupervised learning as well as for obtaining clusters of data from a known class. In sets of strings, the concept of set median string can be extended to the (set) k-medians problem. The solution of the k-medians problem can be viewed as a clustering method, where each cluster is generated by each of the k strings of that solution. A concept which is related to set median string is the (generalized) median string, which is an NP-Hard problem. However, different algorithms have been proposed to find approximations to the (generalized) median string. We propose extending the (generalized) median string problem to k strings, resulting in the generalized k-medians problem, which can also be viewed as a clustering technique. This new technique is applied to a corpus of chromosomes represented by strings and compared to the conventional k-medians technique.
引用
收藏
页码:502 / 509
页数:8
相关论文
共 50 条