Similarity Word-Sequence Kernels for Sentence Clustering

被引:0
|
作者
Andres-Ferrer, Jesus [1 ]
Sanchis-Trilles, German [1 ]
Casacuberta, Francisco [1 ]
机构
[1] Univ Politecn Valencia, Dept Sistemas Informat & Computac, Inst Tecnol Informat, Valencia, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel clustering approach based on the use of kernels as similarity functions and the C-means algorithm. Several word-sequence kernels are defined and extended to verify the properties of similarity functions. Afterwards, these monolingual word-sequence kernels are extended to bilingual word-sequence kernels, and applied to the task of monolingual and bilingual sentence clustering. The motivation of this proposal is to group similar sentences into clusters so that specialised models can be trained for each cluster, with the purpose of reducing in this way both the size and complexity of the initial task. We provide empirical evidence for proving that the use of bilingual kernels can lead to better clusters, in terms of intra-cluster perplexities.
引用
收藏
页码:610 / 619
页数:10
相关论文
共 50 条
  • [1] Word-sequence kernels
    Cancedda, N
    Gaussier, E
    Goutte, C
    Renders, JM
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1059 - 1082
  • [2] Sentence extraction using asymmetric word similarity and topic similarity
    Azmi-Murad, M.
    Martin, T. P.
    [J]. APPLIED SOFT COMPUTING TECHNOLOGIES: THE CHALLENGE OF COMPLEXITY, 2006, 34 : 505 - 514
  • [3] Word Clustering Algorithms Based on Word Similarity
    Yuan, Lichi
    [J]. 2015 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS IHMSC 2015, VOL I, 2015, : 21 - 24
  • [4] Semantic Word Error Rate For Sentence Similarity
    Spiccia, Carmelo
    Augello, Agnese
    Pilato, Giovanni
    Vassallo, Giorgio
    [J]. 2016 IEEE TENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2016, : 265 - 268
  • [5] HIERARCHIES OF PRIMITIVE RECURSIVE WORD-SEQUENCE FUNCTIONS - COMPARISONS AND DECISION-PROBLEMS
    FACHINI, E
    NAPOLI, M
    [J]. THEORETICAL COMPUTER SCIENCE, 1984, 29 (1-2) : 185 - 227
  • [6] Sentence similarity based on semantic kernels for intelligent text retrieval
    Amir, Samir
    Tanasescu, Adrian
    Zighed, Djamel A.
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 48 (03) : 675 - 689
  • [7] Sentence similarity based on semantic kernels for intelligent text retrieval
    Samir Amir
    Adrian Tanasescu
    Djamel A. Zighed
    [J]. Journal of Intelligent Information Systems, 2017, 48 : 675 - 689
  • [8] A New Word Clustering Algorithm Based on Word Similarity
    YUAN Lichi
    [J]. Chinese Journal of Electronics, 2017, 26 (06) : 1221 - 1226
  • [9] A New Word Clustering Algorithm Based on Word Similarity
    Yuan Lichi
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2017, 26 (06) : 1221 - 1226
  • [10] Sentence Semantic Similarity based on Word FiImbedding and WordNet
    Farouk, Mamdouh
    [J]. PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 33 - 37