Word-sequence kernels

被引:94
|
作者
Cancedda, N [1 ]
Gaussier, E [1 ]
Goutte, C [1 ]
Renders, JM [1 ]
机构
[1] Xerox Res Ctr Europe, F-38240 Meylan, France
关键词
kernel machines; text categorisation; linguistic processing; string kernels; sequence kernels;
D O I
10.1162/153244303322533197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We address the problem of categorising documents using kernel-based methods such as Support Vector Machines. Since the work of Joachims (1998), there is ample experimental evidence that SVM using the standard word frequencies as features yield state-of-the-art performance on a number of benchmark problems. Recently. Lodhi et at. (2002) proposed the use of string kernels, a novel way of computing document similarity based of matching non-consecutive subsequences of characters. In this article, we propose the use of this technique with sequences of words rather than characters. This approach has several advantages, in particular it is more efficient computationally and it ties in closely with standard linguistic pre-processing techniques. We present some extensions to sequence kernels dealing with symbol-dependent and match-dependent decay factors, and present empirical evaluations of these extensions on the Reuters-21578 datasets.
引用
收藏
页码:1059 / 1082
页数:24
相关论文
共 50 条
  • [1] Similarity Word-Sequence Kernels for Sentence Clustering
    Andres-Ferrer, Jesus
    Sanchis-Trilles, German
    Casacuberta, Francisco
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 610 - 619
  • [2] HIERARCHIES OF PRIMITIVE RECURSIVE WORD-SEQUENCE FUNCTIONS - COMPARISONS AND DECISION-PROBLEMS
    FACHINI, E
    NAPOLI, M
    [J]. THEORETICAL COMPUTER SCIENCE, 1984, 29 (1-2) : 185 - 227
  • [3] Factored sequence kernels
    Cancedda, Nicola
    Mahe, Pierre
    [J]. NEUROCOMPUTING, 2009, 72 (7-9) : 1407 - 1413
  • [4] LEARNING SEQUENCE KERNELS
    Cortes, Corinna
    Mohri, Mehryar
    Rostamizadeh, Afshin
    [J]. 2008 IEEE WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2008, : 2 - +
  • [5] Composite Kernels for Automatic Word Sense Disambiguation
    Wang, Tinghua
    Zhong, Jian
    Chen, Junting
    Hu, Qi
    [J]. JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2015, 12 (04) : 619 - 623
  • [6] Parametric Kernels for Sequence Data Analysis
    Shin, Young-In
    Fussell, Donald
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1047 - 1052
  • [7] Stem kernels for RNA sequence analyses
    Sakakibara, Yasubumi
    Asai, Kiyoshi
    Sato, Kengo
    [J]. BIOINFORMATICS RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2007, 4414 : 278 - +
  • [8] WORD SEQUENCE, WORD FREQUENCY, AND FREE RECALL
    MAY, RB
    TRYK, HE
    [J]. CANADIAN JOURNAL OF PSYCHOLOGY, 1970, 24 (05): : 299 - &
  • [9] Robust Word Similarity Estimation Using Perturbation Kernels
    Collins-Thompson, Kevyn
    [J]. ADVANCES IN INFORMATION RETRIEVAL THEORY, 2009, 5766 : 265 - 272
  • [10] Automated essay scoring with string kernels and word embeddings
    Cozma, Madalina
    Butnaru, Andrei M.
    Ionescu, Radu Tudor
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 503 - 509