Word-sequence kernels

被引:94
|
作者
Cancedda, N [1 ]
Gaussier, E [1 ]
Goutte, C [1 ]
Renders, JM [1 ]
机构
[1] Xerox Res Ctr Europe, F-38240 Meylan, France
关键词
kernel machines; text categorisation; linguistic processing; string kernels; sequence kernels;
D O I
10.1162/153244303322533197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We address the problem of categorising documents using kernel-based methods such as Support Vector Machines. Since the work of Joachims (1998), there is ample experimental evidence that SVM using the standard word frequencies as features yield state-of-the-art performance on a number of benchmark problems. Recently. Lodhi et at. (2002) proposed the use of string kernels, a novel way of computing document similarity based of matching non-consecutive subsequences of characters. In this article, we propose the use of this technique with sequences of words rather than characters. This approach has several advantages, in particular it is more efficient computationally and it ties in closely with standard linguistic pre-processing techniques. We present some extensions to sequence kernels dealing with symbol-dependent and match-dependent decay factors, and present empirical evaluations of these extensions on the Reuters-21578 datasets.
引用
收藏
页码:1059 / 1082
页数:24
相关论文
共 50 条
  • [41] Evolutionary optimization of sequence kernels for detection of bacterial gene starts
    Mersch, Britta
    Glasmachers, Tobias
    Meinicke, Peter
    Igel, Christian
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2006, PT 2, 2006, 4132 : 827 - +
  • [42] Deep Chinese Word Sense Disambiguation Method Based on Sequence to Sequence
    Tang Shancheng
    Ma Fuyu
    Chen Xiongxiong
    Zhang Puyue
    [J]. 2018 INTERNATIONAL CONFERENCE ON SENSOR NETWORKS AND SIGNAL PROCESSING (SNSP 2018), 2018, : 498 - 503
  • [43] Sequence Kernels for Clustering and Visualizing Near Duplicate Video Segments
    Bailer, Werner
    [J]. ADVANCES IN MULTIMEDIA MODELING, 2012, 7131 : 383 - 394
  • [44] Designing Efficient SIMD Kernels for High Performance Sequence Alignment
    Popovici, Doru Thom
    Awan, Muaaz Gul
    Guidi, Giulia
    Egan, Rob
    Hofmeyr, Steven
    Oliker, Leonid
    Yelick, Katherine
    [J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 167 - 176
  • [45] BERGMAN KERNELS FOR A SEQUENCE OF ALMOST KAHLER-RICCI SOLITONS
    Jiang, Wenshuai
    Wang, Feng
    Zhu, Xiaohua
    [J]. ANNALES DE L INSTITUT FOURIER, 2017, 67 (03) : 1279 - 1320
  • [46] Rapid Bird Activity Detection Using Probabilistic Sequence Kernels
    Thakur, Anshul
    Jyothi, R.
    Rajan, Padmanabhan
    Dileep, A. D.
    [J]. 2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 1754 - 1758
  • [47] EFFECTS OF CROP SEQUENCE AND SOIL TYPES ON MYCOFLORA OF GROUNDNUT KERNELS
    JOFFE, AZ
    LISKER, N
    [J]. PLANT AND SOIL, 1970, 32 (02) : 531 - &
  • [48] CHRONOLOGICAL SEQUENCE OF PIGMENT DEVELOPMENT IN KERNELS OF PECAN CULTIVAR STUART
    KAYS, SJ
    WILSON, DM
    [J]. SCIENTIA HORTICULTURAE, 1977, 6 (03) : 213 - 222
  • [49] Modeling Position Specificity in Sequence Kernels by Fuzzy Equivalence Relations
    Bodenhofer, Ulrich
    Schwarzbauer, Karin
    Ionescu, Mihaela
    Hochreiter, Sepp
    [J]. PROCEEDINGS OF THE JOINT 2009 INTERNATIONAL FUZZY SYSTEMS ASSOCIATION WORLD CONGRESS AND 2009 EUROPEAN SOCIETY OF FUZZY LOGIC AND TECHNOLOGY CONFERENCE, 2009, : 1376 - 1381
  • [50] Kernels for Longitudinal Data with Variable Sequence Length and Sampling Intervals
    Lu, Zhengdong
    Leen, Todd K.
    Kaye, Jeffrey
    [J]. NEURAL COMPUTATION, 2011, 23 (09) : 2390 - 2420