Semantic string operation for specializing AHC algorithm for text clustering

被引:5
|
作者
Jo, Taeho [1 ]
机构
[1] 190 Garosuro, Cheongju 28168, South Korea
关键词
String vector; Semantic similarity; String vector based AHC algorithm; Text clustering;
D O I
10.1007/s10472-019-09687-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article proposes the modified AHC (Agglomerative Hierarchical Clustering) algorithm which clusters string vectors, instead of numerical vectors, as the approach to the text clustering. The results from applying the string vector based algorithms to the text clustering were successful in previous works and synergy effect between the text clustering and the word clustering is expected by combining them with each other; the two facts become motivations for this research. In this research, we define the operation on string vectors called semantic similarity, and modify the AHC algorithm by adopting the proposed similarity metric as the approach to the text clustering. The proposed AHC algorithm is empirically validated as the better approach in clustering texts in news articles and opinions. We need to define and characterize mathematically more operations on string vectors for modifying more advanced machine learning algorithms.
引用
收藏
页码:1083 / 1100
页数:18
相关论文
共 50 条
  • [1] Semantic string operation for specializing AHC algorithm for text clustering
    Taeho Jo
    [J]. Annals of Mathematics and Artificial Intelligence, 2020, 88 : 1083 - 1100
  • [2] String Vector based AHC for Text Clustering
    Jo, Taeho
    [J]. 2017 19TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATIONS TECHNOLOGY (ICACT) - OPENING NEW ERA OF SMART SOCIETY, 2017, : 673 - 678
  • [3] Graph based AHC Algorithm for Text Clustering
    Jo, Taeho
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 309 - 314
  • [4] An Incremental Algorithm of Text Clustering Based on Semantic Sequences
    FENG Zhonghui
    [J]. Wuhan University Journal of Natural Sciences, 2006, (05) : 1340 - 1344
  • [5] Text Clustering Algorithm Based on Semantic Graph Structure
    Bai, Qiuchan
    Jin, Chunxia
    [J]. PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 312 - 316
  • [6] Table based AHC Algorithm for Clustering Words
    Jo, Taeho
    [J]. 2016 18TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATIONS TECHNOLOGY (ICACT) - INFORMATION AND COMMUNICATIONS FOR SAFE AND SECURE LIFE, 2016, : 570 - 575
  • [7] Genetic algorithm for text clustering based on latent semantic indexing
    Song, Wei
    Park, Soon Cheol
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 57 (11-12) : 1901 - 1907
  • [8] An Ontology-based Semantic Clustering Algorithm for Accounting Text
    Jiang, Yanhui
    Li, Mo
    Yao, Kaohua
    [J]. INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS & STATISTICS, 2013, 43 (13): : 59 - 67
  • [9] Semi-Supervised Semantic Dynamic Text Clustering Algorithm
    Qian, Zhi-Sen
    Huang, Rui-Zhang
    Wei, Qin
    Qin, Yong-Bin
    Chen, Yan-Ping
    [J]. Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2019, 48 (06): : 803 - 808
  • [10] Research on The parallel Text Clustering Algorithm Based on the Semantic Tree
    Liu, Gangfeng
    Wang, Yunlan
    Zhao, Tianhai
    Li, Dongyang
    [J]. 2011 6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY (ICCIT), 2012, : 400 - 403