Using complex networks for text classification: Discriminating informative and imaginative documents

被引:50
|
作者
de Arruda, Henrique F. [1 ]
Costa, Luciano da F. [2 ]
Amancio, Diego R. [1 ]
机构
[1] Univ Sao Paulo Sao Carlos, Inst Math & Comp Sci, Sao Paulo, Brazil
[2] Univ Sao Paulo Sao Carlos, Sao Carlos Inst Phys, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
HUMAN LANGUAGE; WORLD;
D O I
10.1209/0295-5075/113/28007
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts. Copyright (C) EPLA, 2016
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Text mining in the classification of digital documents
    Contreras Barrera, Marcial
    BIBLIOS-REVISTA DE BIBLIOTECOLOGIA Y CIENCIAS DE LA INFORMACION, 2016, (64): : 33 - 43
  • [22] A fuzzy approach to classification of text documents
    Liu, WY
    Song, N
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2003, 18 (05) : 640 - 647
  • [23] Discriminating between empirical studies and nonempirical works using automated text classification
    Langlois, Alexis
    Nie, Jian-Yun
    Thomas, James
    Hong, Quan Nha
    Pluye, Pierre
    RESEARCH SYNTHESIS METHODS, 2018, 9 (04) : 587 - 601
  • [24] Text Classification using Triplet Capsule Networks
    Wu, Yujia
    Li, Jing
    Chen, Vincent
    Chang, Jun
    Ding, Zhiquan
    Wang, Zhi
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [25] Arabic text classification using Polynomial Networks
    Al-Tahrawi, Mayy M.
    Al-Khatib, Sumaya N.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2015, 27 (04) : 437 - 449
  • [26] Automatic text classification using words networks
    Pablo Cardenas, Juan
    Olivares, Gaston
    Alfaro, Rodrigo
    REVISTA SIGNOS, 2014, 47 (86): : 346 - 364
  • [27] Study on Text Classification using Capsule Networks
    Katarya, Rahul
    Arora, Yamini
    2019 5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS (ICACCS), 2019, : 501 - 505
  • [28] Interpretable Text Classification in Legal Contract Documents using Tsetlin Machines
    Saha, Rupsa
    Jyhne, Sander
    2022 INTERNATIONAL SYMPOSIUM ON THE TSETLIN MACHINE (ISTM 2022), 2022, : 7 - 12
  • [29] CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH
    Danti, Ajit
    Bhushan, S. N. Bharath
    IIOAB JOURNAL, 2016, 7 (02) : 45 - 50
  • [30] Multi-Label Classification of Text Documents Using Deep Learning
    Mohammed, Hamza Haruna
    Dogdu, Erdogan
    Gorur, Abdul Kadir
    Choupani, Roya
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 4681 - 4689