Text Classification using Graph Mining-based Feature Extraction

被引:9
|
作者
Jiang, Chuntao [1 ]
Coenen, Frans [1 ]
Sanderson, Robert [1 ]
Zito, Michele [1 ]
机构
[1] Univ Liverpool, Dept Comp Sci, Liverpool L69 3BX, Merseyside, England
关键词
D O I
10.1007/978-1-84882-983-1_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A graph-based approach to document classification is described in this paper. The graph representation offers the advantage that it allows for a much more expressive document encoding than the more standard bag of words/phrases approach, and consequently gives an improved classification accuracy. Document sets are represented as graph sets to which a weighted graph mining algorithm is applied to extract frequent subgraphs, which are then further processed to produce feature vectors (one per document) for classification. Weighted subgraph mining is used to ensure classification effectiveness and computational efficiency; only the most significant subgraphs are extracted. The approach is validated and evaluated using several popular classification algorithms together with a real world textual data set. The results demonstrate that the approach can outperform existing text classification algorithms on some dataset. When the size of dataset increased, further processing on extracted frequent features is essential.
引用
收藏
页码:21 / 34
页数:14
相关论文
共 50 条
  • [1] Text classification using graph mining-based feature extraction
    Jiang, Chuntao
    Coenen, Frans
    Sanderson, Robert
    Zito, Michele
    [J]. KNOWLEDGE-BASED SYSTEMS, 2010, 23 (04) : 302 - 308
  • [2] The feature extraction of text mining based on Web
    Liu, LZ
    Chen, JJ
    Song, HT
    [J]. ICEMI'2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ELECTRONIC MEASUREMENT & INSTRUMENTS, VOLS 1-3, 2003, : 547 - 550
  • [3] Opinion Mining-Based Term Extraction Sentiment Classification Modeling
    Kim, Tae Yeun
    Kim, Hyoung Ju
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [4] An improved text mining-based space mission risk classification approach
    Sapountzoglou, Nikolaos
    Andrikos, Nikos
    [J]. ACTA ASTRONAUTICA, 2023, 207 : 353 - 360
  • [5] Text Mining-based Research on Aircraft Faults Classification and Retrieval Model
    Xu, Xingxing
    Zhou, Shenghan
    Xiao, Yiyong
    Chang, Wenbing
    Wei, Fajie
    Yang, Ming
    [J]. 2020 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS 2020), 2020,
  • [6] Text mining-based construction site accident classification using hybrid supervised machine learning
    Cheng, Min-Yuan
    Kusoemo, Denny
    Gosno, Richard Antoni
    [J]. AUTOMATION IN CONSTRUCTION, 2020, 118
  • [7] Feature Extraction based Text Classification: A review
    Shaker, Saif Safaa
    Alhajim, Dhafer
    Al-Khazaali, Ahmed Ali Talib
    Hussein, Hussein Aqeel
    Athab, Ali F.
    [J]. JOURNAL OF ALGEBRAIC STATISTICS, 2022, 13 (01) : 646 - 653
  • [8] Non-negative matrix factorization based text mining: Feature extraction and classification
    Barman, P. C.
    Iqbal, Nadeem
    Lee, Soo-Young
    [J]. NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 703 - 712
  • [9] Text Classification using Different Feature Extraction Approaches
    Dzisevic, Robert
    Sesok, Dmitrij
    [J]. 2019 OPEN CONFERENCE OF ELECTRICAL, ELECTRONIC AND INFORMATION SCIENCES (ESTREAM), 2019,
  • [10] Lexicon based feature extraction for emotion text classification
    Bandhakavi, Anil
    Wiratunga, Nirmalie
    Padmanabhan, Deepak
    Massie, Stewart
    [J]. PATTERN RECOGNITION LETTERS, 2017, 93 : 133 - 142