Complex linguistic features for text classification: A comprehensive study

被引:0
|
作者
Moschitti, A [1 ]
Basili, R
机构
[1] Univ Texas, Human Language Technol Res Inst, Richardson, TX 75083 USA
[2] Univ Roma Tor Vergata, Comp Sci Dept, I-00133 Rome, Italy
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Previous researches on advanced representations for document retrieval have shown that statistical state-of-the-art models are not improved by a variety of different linguistic representations. Phrases, word senses and syntactic relations derived by Natural Language Processing (NLP) techniques were observed ineffective to increase retrieval accuracy. For Text Categorization (TC) are available fewer and less definitive studies on the use of advanced document representations as it is a relatively new research area (compared to document retrieval). In this paper, advanced document representations have been investigated. Extensive experimentation on representative classifiers, Rocchio and SVM, as well as a careful analysis of the literature have been carried out to study how some NLP techniques used for indexing impact TC. Cross validation over 4 different corpora in two languages allowed us to gather an overwhelming evidence that complex nominals, proper nouns and word senses are not adequate to improve TC accuracy.
引用
收藏
页码:181 / 196
页数:16
相关论文
共 50 条
  • [1] Using complex linguistic features in context-sensitive Text Classification techniques
    Wong, AKS
    Lee, JWT
    Yeung, DS
    [J]. PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3183 - 3188
  • [2] Linguistic features integration for text classification tasks in Spanish
    Garcia-Diaz, Jose Antonio
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (70): : 227 - 230
  • [3] Empirical investigation of fast text classification over linguistic features
    Basili, R
    Moschitti, A
    Pazienza, MT
    [J]. ECAI 2002: 15TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, 77 : 485 - 489
  • [4] Text plagiarism classification using syntax based linguistic features
    Vani, K.
    Gupta, Deepa
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 88 : 448 - 464
  • [5] Use of linguistic features in context-sensitive text classification
    Wong, Alex K. S.
    Lee, John W. T.
    Yeung, Daniel S.
    [J]. ADVANCES IN MACHINE LEARNING AND CYBERNETICS, 2006, 3930 : 701 - 710
  • [6] A Comprehensive Study of Text Classification Algorithms
    Vijayan, Vikas K.
    Bindu, K. R.
    Parameswaran, Latha
    [J]. 2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 1109 - 1113
  • [7] AUTOMATIC CLASSIFICATION OF TEXT WRITTEN BY EFL LEARNERS BASED ON LINGUISTIC FEATURES AND LEARNER FEATURES
    Kotani, Katsunori
    Yoshimi, Takehiko
    Uchida, Mayumi
    [J]. 7TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE (INTED2013), 2013, : 6305 - 6314
  • [8] Linguistic Features for Subjectivity classification
    Huong Nguyen Thi Xuan
    Anh Cuong Le
    Le Minh Nguyen
    [J]. 2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 17 - 20
  • [10] Text Genres and Registers: The Computation of Linguistic Features
    Pan, Fan
    Tao, Guoxiao
    [J]. SCIENTOMETRICS, 2017, 113 (03) : 1815 - 1818