Using complex networks for text classification: Discriminating informative and imaginative documents

被引:50
|
作者
de Arruda, Henrique F. [1 ]
Costa, Luciano da F. [2 ]
Amancio, Diego R. [1 ]
机构
[1] Univ Sao Paulo Sao Carlos, Inst Math & Comp Sci, Sao Paulo, Brazil
[2] Univ Sao Paulo Sao Carlos, Sao Carlos Inst Phys, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
HUMAN LANGUAGE; WORLD;
D O I
10.1209/0295-5075/113/28007
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts. Copyright (C) EPLA, 2016
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Complex approach to the text documents classification
    Tolcheev, V.O.
    Avtomatizatsiya i Sovremennye Tekhnologii, 2005, (08): : 39 - 45
  • [2] Using Complex Networks to Improve Legal Text Hierarchical Classification
    Pires, Rilder S.
    Silveira, Raquel
    Fernandes, Carlos G. O.
    Neto, Joao A. Monteiro
    Furtado, Vasco
    INTELLIGENT SYSTEMS, BRACIS 2024, PT II, 2025, 15413 : 476 - 490
  • [3] Classification of text documents
    Li, YH
    Jain, AK
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1295 - 1297
  • [4] Classification of text documents
    Li, YH
    Jain, AK
    COMPUTER JOURNAL, 1998, 41 (08): : 537 - 546
  • [5] Text/Non-Text Classification in Online Handwritten Documents with Recurrent Neural Networks
    Truyen Van Phan
    Nakagawa, Masaki
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 23 - 28
  • [6] Classification of Scientific Documents in the Kazakh Language Using Deep Neural Networks and a Fusion of Images and Text
    Bogdanchikov, Andrey
    Ayazbayev, Dauren
    Varlamis, Iraklis
    BIG DATA AND COGNITIVE COMPUTING, 2022, 6 (04)
  • [7] Text classification from labeled and unlabeled documents using EM
    Nigam, K
    McCallum, AK
    Thrun, S
    Mitchell, T
    MACHINE LEARNING, 2000, 39 (2-3) : 103 - 134
  • [8] Text Classification from Labeled and Unlabeled Documents using EM
    Kamal Nigam
    Andrew Kachites Mccallum
    Sebastian Thrun
    Tom Mitchell
    Machine Learning, 2000, 39 : 103 - 134
  • [9] Classification Methods of Text Documents Using Ontology Based Approach
    Lytvyn, Vasyl
    Vysotska, Victoria
    Veres, Oleh
    Rishnyak, Ihor
    Rishnyak, Halya
    ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING, CSIT 2016, 2017, 512 : 229 - 240
  • [10] Automatic Text Classification of PDF Documents using NLP Techniques
    Abdoun, Nabil
    Chami, Mohammad
    INCOSE International Symposium, 2022, 32 (01) : 1320 - 1331