Word-sequence kernels

被引:94
|
作者
Cancedda, N [1 ]
Gaussier, E [1 ]
Goutte, C [1 ]
Renders, JM [1 ]
机构
[1] Xerox Res Ctr Europe, F-38240 Meylan, France
关键词
kernel machines; text categorisation; linguistic processing; string kernels; sequence kernels;
D O I
10.1162/153244303322533197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We address the problem of categorising documents using kernel-based methods such as Support Vector Machines. Since the work of Joachims (1998), there is ample experimental evidence that SVM using the standard word frequencies as features yield state-of-the-art performance on a number of benchmark problems. Recently. Lodhi et at. (2002) proposed the use of string kernels, a novel way of computing document similarity based of matching non-consecutive subsequences of characters. In this article, we propose the use of this technique with sequences of words rather than characters. This approach has several advantages, in particular it is more efficient computationally and it ties in closely with standard linguistic pre-processing techniques. We present some extensions to sequence kernels dealing with symbol-dependent and match-dependent decay factors, and present empirical evaluations of these extensions on the Reuters-21578 datasets.
引用
收藏
页码:1059 / 1082
页数:24
相关论文
共 50 条
  • [31] SHARI'A: WEIGHT, ORDER AND SEQUENCE OF THE WORD
    Benmakhlouf, Ali
    [J]. TEMPS MODERNES, 2015, 70 (683): : 178 - 191
  • [32] Word and Table: The Origins of a Liturgical Sequence
    Cosgrove, Charles H.
    [J]. VIGILIAE CHRISTIANAE, 2020, 74 (04) : 357 - 373
  • [33] SEQUENCE ALIGNMENT BY WORD-PROCESSOR
    BOSWELL, DR
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1987, 12 (07) : 279 - 280
  • [34] WEST: WORD ENCODED SEQUENCE TRANSDUCERS
    Variani, Ehsan
    Suresh, Ananda Theertha
    Weintraub, Mitchel
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7340 - 7344
  • [35] ON THE CHARACTERISTIC WORD OF THE INHOMOGENEOUS BEATTY SEQUENCE
    KOMATSU, T
    [J]. BULLETIN OF THE AUSTRALIAN MATHEMATICAL SOCIETY, 1995, 51 (02) : 337 - 351
  • [36] Word embeddings for protein sequence analysis
    Sequeira, Ana Marta
    Gomes, Ivan
    Rocha, Miguel
    [J]. 2023 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, CIBCB, 2023, : 98 - 105
  • [37] A Convolutional Architecture for Word Sequence Prediction
    Wang, Mingxuan
    Lu, Zhengdong
    Li, Hang
    Jiang, Wenbin
    Liu, Qun
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 1567 - 1576
  • [38] Evolutionary optimization of sequence kernels for detection of bacterial gene starts
    Mersch, Britta
    Glasmachers, Tobias
    Meinicke, Peter
    Igel, Christian
    [J]. INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2007, 17 (05) : 369 - 381
  • [39] STATE-OF-THE-ART SEQUENCE KERNELS FOR SVM SPEAKER VERIFICATION
    Louradour, Jerome
    Daoudi, Khalid
    [J]. 2008 IEEE WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2008, : 498 - +
  • [40] String kernels for protein sequence comparisons: improved fold recognition
    Nojoomi, Saghi
    Koehl, Patrice
    [J]. BMC BIOINFORMATICS, 2017, 18