WORD COMBINATION KERNEL FOR TEXT CLASSIFICATION WITH SUPPORT VECTOR MACHINES

被引:0
|
作者
Zhang, Lujiang [1 ]
Hu, Xiaohui [1 ]
机构
[1] Beijing Univ Aeronaut & Astronaut, Sch Automat Sci & Elect Engn, Beijing 100191, Peoples R China
关键词
Machine learning; kernel methods; support vector machines; text classification; word-combination kernel; CATEGORIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we propose a novel kernel for text categorization. This kernel is an inner product defined in the feature space generated by all word combinations of specified length. A word combination is a collection of unique words co-occurring in the same sentence. The word combination of length k(th) is weighted by the kth root of the product of the inverse document frequencies (IDF) of its words. By discarding word order, the word combination features are more compatible with the flexibility of natural language and the feature dimensions of documents can be reduced significantly to improve the sparseness of feature representations. By restricting the words to the same sentence and considering multi-word combinations, the word combination features can capture similarity at a more specific level than single words. A computationally simple and efficient algorithm was proposed to calculate this kernel. We conducted a series of experiments on the Reuters-21578 and 20 Newsgroups datasets. This kernel achieves better performance than the word kernel and word-sequence kernel. We also evaluated the computing efficiency of this kernel and observed the impact of the word combination length on performance.
引用
收藏
页码:877 / 896
页数:20
相关论文
共 50 条
  • [1] Word combination kernel for text classification with support vector machines
    School of Automation Science and Electrical Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China
    [J]. Comput. Inf., 2013, 4 (877-896):
  • [2] Text Message Authorship Classification Using Kernel Support Vector Machines
    Kretchmar, Matt
    Zhao, Yifu
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), VOL 2, 2014, : 215 - 218
  • [3] Support Vector Machines and Word2vec for Text Classification with Semantic Features
    Lilleberg, Joseph
    Zhu, Yun
    Zhang, Yanqing
    [J]. PROCEEDINGS OF 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2015, : 136 - 140
  • [4] Support Vector Machines based on a semantic kernel for text categorization
    Siolas, G
    d'Alché-Buc, F
    [J]. IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL V, 2000, : 205 - 209
  • [5] Virtual examples for text classification with support vector machines
    Sassano, M
    [J]. PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2003, : 208 - 215
  • [7] Text classification of news articles with support vector machines
    Paass, G
    Kindermann, J
    Leopold, E
    [J]. TEXT MINING AND ITS APPLICATIONS, 2004, 138 : 53 - 64
  • [8] Dimension reduction in text classification with support vector machines
    Kim, H
    Howland, P
    Park, H
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2005, 6 : 37 - 53
  • [9] Weighted Transductive Support Vector Machines for text classification
    Liu, Shuang
    Jia, Chuanying
    Ma, Heng
    [J]. DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 445 - 449
  • [10] Support Vector Machines with the Correlation Kernel for the Classification of Raman Spectra
    Kyriakides, Alexandros
    Kastanos, Evdokia
    Hadjigeorgiou, Katerina
    Pitris, Costas
    [J]. ADVANCED BIOMEDICAL AND CLINICAL DIAGNOSTIC SYSTEMS IX, 2011, 7890