Text classification using tree kernels and linguistic information

被引:3
|
作者
Goncalves, Teresa [1 ]
Quaresma, Paulo [1 ]
机构
[1] Univ Evora, Dept Informat, P-7000671 Evora, Portugal
关键词
D O I
10.1109/ICMLA.2008.78
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Standard Machine Learning approaches to text classification use the bag-of-words representation of documents to deceive the classification target function. Typical linguistic structures such as morphology, syntax and semantic are completely ignored in the learning process. This paper examines the role of these structures on the classifier construction applying the study to the Portuguese language. Classifiers are built using the SVM algorithm on a newspaper's articles dataset. The results show that syntactic structure is not useful for text classification (as initially expected), but a novel structured representation that uses document's semantic information has the same discriminative power over classes as the traditional bag-of-words one.
引用
收藏
页码:763 / 768
页数:6
相关论文
共 50 条
  • [1] Text Classification with Heterogeneous Information Network Kernels
    Wang, Chenguang
    Song, Yangqiu
    Li, Haoran
    Zhang, Ming
    Han, Jiawei
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2130 - 2136
  • [2] Using Graph-Kernels to Represent Semantic Information in Text Classification
    Goncalves, Teresa
    Quaresma, Paulo
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 632 - 646
  • [3] Text classification using string kernels
    Lodhi, H
    Shawe-Taylor, J
    Cristianini, N
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 563 - 569
  • [4] Text classification using string kernels
    Lodhi, H
    Saunders, C
    Shawe-Taylor, J
    Cristianini, N
    Watkins, C
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) : 419 - 444
  • [5] Classification of Comments by Tree Kernels Using the Hierarchy of Wikipedia for Tree Structures
    Takeda, Masahiro
    Kobayashi, Nobuyuki
    Kitagawa, Fumio
    Shiina, Hiromitsu
    [J]. PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016, 2016, : 123 - 127
  • [6] Glycan classification with tree kernels
    Yamanishi, Yoshihiro
    Bach, Francis
    Vert, Jean-Philippe
    [J]. BIOINFORMATICS, 2007, 23 (10) : 1211 - 1216
  • [7] Using linguistic information to classify Portuguese text documents
    Goncalves, Teresa
    Quaresma, Paulo
    [J]. PROCEEDINGS OF THE SPECIAL SESSION OF THE SEVENTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE - MICAI 2008, 2008, : 94 - 100
  • [8] Text plagiarism classification using syntax based linguistic features
    Vani, K.
    Gupta, Deepa
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 88 : 448 - 464
  • [9] Argumentative Evidences Classification and Argument Scheme Detection Using Tree Kernels
    Liga, Davide
    [J]. 6TH WORKSHOP ON ARGUMENT MINING (ARGMINING 2019), 2019, : 92 - 97
  • [10] Extending Tree Kernels with Topological Information
    Aiolli, Fabio
    Da San Martino, Giovanni
    Sperduti, Alessandro
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I, 2011, 6791 : 142 - 149