A multi-layer text classification framework based on two-level representation model

被引:25
|
作者
Yun, Jiali [1 ]
Jing, Liping [1 ]
Yu, Jian [1 ]
Huang, Houkuan [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification; Text representation; Multi-layer classification; Wikipedia; Semantics;
D O I
10.1016/j.eswa.2011.08.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more difficult to be analyzed because it contains complicated both syntactic and semantic information. In this paper, we propose a two-level representation model (2RM) to represent text data, one is for representing syntactic information and the other is for semantic information. Each document, in syntactic level, is represented as a term vector where the value of each component is the term frequency and inverse document frequency. The Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. Meanwhile, we designed a multi-layer classification framework (MLCLA) to make use of the semantic and syntactic information represented in 2RM model. The MLCLA framework contains three classifiers. Among them, two classifiers are applied on syntactic level and semantic level in parallel. The outputs of these two classifiers will be combined and input to the third classifier, so that the final results can be obtained. Experimental results on benchmark data sets (20Newsgroups, Reuters-21578 and Classic3) have shown that the proposed 2RM model plus MLCLA framework improves the text classification performance by comparing with the existing flat text representation models (Term-based VSM, Term Semantic Kernel Model, Concept-based VSM, Concept Semantic Kernel Model and Term + Concept VSM) plus existing classification methods. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2035 / 2046
页数:12
相关论文
共 50 条
  • [1] Semantics-Based Representation Model for Multi-layer Text Classification
    Yun, Jiali
    Jing, Liping
    Yu, Jian
    Huang, Houkuan
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II, 2010, 6277 : 1 - 10
  • [2] Attention model with multi-layer supervision for text Classification
    Yue, Chunyi
    Cao, Hanqiang
    Xu, Guoping
    Dong, Youli
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020), 2020, : 103 - 109
  • [3] A Multi-Layer Feature Fusion Model Based on Convolution and Attention Mechanisms for Text Classification
    Yang, Hua
    Zhang, Shuxiang
    Shen, Hao
    Zhang, Gexiang
    Deng, Xingquan
    Xiong, Jianglin
    Feng, Li
    Wang, Junxiong
    Zhang, Haifeng
    Sheng, Shenyang
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (14):
  • [4] An enrichment multi-layer Arabic text classification model based on siblings patterns extraction
    Idrees A.M.
    Al-Solami A.L.M.
    [J]. Neural Computing and Applications, 2024, 36 (14) : 8221 - 8234
  • [5] Multi-Document Summarization Based on Two-Level Sparse Representation Model
    Liu, He
    Yu, Hongliang
    Deng, Zhi-Hong
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 196 - 202
  • [6] A framework of building information model based on two-level model
    Xu Yunxi
    Wang Yaowu
    Zou Zhicong
    [J]. Proceedings of 2006 International Conference on Construction & Real Estate Management, Vols 1 and 2: COLLABORATION AND DEVELOPMENT IN CONSTRUCTION AND REAL ESTATE, 2006, : 698 - 701
  • [7] Web Topic Representation Based on Multi-layer Semantic Model
    Shi, Peng
    Hu, Changjun
    Zhao, Ruopeng
    Ding, Lianhong
    [J]. ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 244 - +
  • [8] Multi-Layer Text Classification with Voting for Consumer Reviews
    Zhu, Yan
    Moh, Melody
    Moh, Teng-Sheng
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 1991 - 1999
  • [9] Hybrid metaheuristic multi-layer reinforcement learning approach for two-level energy management strategy framework of multi-microgrid systems
    Yin, Linfei
    Li, Shengyuan
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 104
  • [10] Two-level hierarchical combination method for text classification
    Li, Wen
    Miao, Duoqian
    Wang, Weili
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 2030 - 2039