Structure-Based Supervised Term Weighting and Regularization for Text Classification

被引:1
|
作者
Shanavas, Niloofer [1 ]
Wang, Hui [1 ]
Lin, Zhiwei [1 ]
Hawe, Glenn [1 ]
机构
[1] Ulster Univ, Sch Comp, Jordanstown, North Ireland
关键词
Text mining; Classification; Graph-based text representation; Supervised term weighting; Node centrality; Structured regularization; REGRESSION; SELECTION;
D O I
10.1007/978-3-030-23281-8_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text documents have rich information that can be useful for different tasks. How to utilise the rich information in texts effectively and efficiently for tasks such as text classification is still an active research topic. One approach is to weight the terms in a text document based on their relevance to the classification task at hand. Another approach is to utilise structural information in a text document to regularize learning so that the learned model is more accurate. An important question is, can we combine the two approaches to achieve better performance? This paper presents a novel method for utilising the rich information in texts. We use supervised term weighting, which utilises the class information in a set of pre-classified training documents, thus the resulting term weighting is class specific. We also use structured regularization, which incorporates structural information into the learning process. A graph is built for each class from the pre-classified training documents and structural information in the graphs is used to calculate the supervised term weights and to define the groups for structured regularization. Experimental results for six text classification tasks show the increase in text classification accuracy with the utilisation of structural information in text for both weighting and regularization. Using graph-based text representation for supervised term weighting and structured regularization can build a compact model with considerable improvement in the performance of text classification.
引用
收藏
页码:105 / 117
页数:13
相关论文
共 50 条
  • [1] Supervised Graph-Based Term Weighting Scheme for Effective Text Classification
    Shanavas, Niloofer
    Wang, Hui
    Lin, Zhiwei
    Hawe, Glenn
    [J]. ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1710 - 1711
  • [2] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Dogan, Turgut
    Uysal, Alper Kursat
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9545 - 9560
  • [3] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Turgut Dogan
    Alper Kursat Uysal
    [J]. Arabian Journal for Science and Engineering, 2019, 44 : 9545 - 9560
  • [4] An improved supervised term weighting scheme for text representation and classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 189
  • [5] Supervised term-category feature weighting for improved text classification
    Attieh, Joseph
    Tekli, Joe
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 261
  • [6] Supervised Contrastive Learning with Term Weighting for Improving Chinese Text Classification
    Guo, Jiabao
    Zhao, Bo
    Liu, Hui
    Liu, Yifan
    Zhong, Qian
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2023, 28 (01) : 59 - 68
  • [7] Self-supervised regularization for text classification
    Zhou, Meng
    Li, Zechen
    Xie, Pengtao
    [J]. Transactions of the Association for Computational Linguistics, 2021, 9 : 1147 - 1162
  • [8] Self-supervised Regularization for Text Classification
    Zhou, Meng
    Li, Zechen
    Xie, Pengtao
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 641 - 656
  • [9] Supervised term weighting for automated text categorization
    Debole, F
    Sebastiani, F
    [J]. TEXT MINING AND ITS APPLICATIONS, 2004, 138 : 81 - 97
  • [10] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Mounia Haddoud
    Aïcha Mokhtari
    Thierry Lecroq
    Saïd Abdeddaïm
    [J]. Knowledge and Information Systems, 2016, 49 : 909 - 931