Hybrid embedding-based text representation for hierarchical multi-label text classification

被引:26
|
作者
Ma, Yinglong [1 ]
Liu, Xiaofeng [1 ]
Zhao, Lijiao [1 ]
Liang, Yue [1 ]
Zhang, Peng [1 ]
Jin, Beihong [2 ]
机构
[1] North China Elect Power Univ, Sch Control & Comp Engn, Beijing 102206, Peoples R China
[2] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
基金
国家重点研发计划;
关键词
Hierarchical classification; Text classification; Multi-label classification; Graph embedding; Hybrid embedding; STATISTICAL COMPARISONS; CLASSIFIERS; INFORMATION;
D O I
10.1016/j.eswa.2021.115905
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world text classification tasks often deal with a large number of closely related categories organized in a hierarchical structure or taxonomy. Hierarchical multi-label text classification (HMTC) has become rather challenging when it requires handling large sets of closely related categories. The structural features of all categories in the entire hierarchy and the word semantics of their category labels are very helpful in improving text classification accuracy over large sets of closely related categories, which has been neglected in most of existing HMTC approaches. In this paper, we present a hybrid embedding-based text representation for HMTC with high accuracy. First, the hybrid embedding consists of both graph embedding of categories in the hierarchy and their word embedding of category labels. The Structural Deep Network Embedding-based graph embedding model is used to simultaneously encode the global and local structural features of a given category in the whole hierarchy for making the category structurally discriminable. We further use the word embedding technique to encode the word semantics of each category label in the hierarchy for making different categories semantically discriminable. Second, we presented a level-by-level HMTC approach based on the bidirectional Gated Recurrent Unit network model together with the hybrid embedding that is used to learn the representation of the text levelby-level. Last but not least, extensive experiments were made over five large-scale real-world datasets in comparison with the state-of-the-art hierarchical and flat multi-label text classification approaches, and the experimental results show that our approach is very competitive to the state-of-the-art approaches in classification accuracy, in particular maintaining computational costs while achieving superior performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] HE-HMTC: A hybrid embedding-based text representation for Hierarchical multi-label text classification
    Liu, Xiaofeng
    Liu, Huili
    Ma, Yinglong
    [J]. SOFTWARE IMPACTS, 2022, 14
  • [2] LABEL-AWARE TEXT REPRESENTATION FOR MULTI-LABEL TEXT CLASSIFICATION
    Guo, Hao
    Li, Xiangyang
    Zhang, Lei
    Liu, Jia
    Chen, Wei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7728 - 7732
  • [3] Multi-label text classification model based on semantic embedding
    Yan Danfeng
    Ke Nan
    Gu Chao
    Cui Jianfei
    Ding Yiqi
    [J]. The Journal of China Universities of Posts and Telecommunications, 2019, 26 (01) : 95 - 104
  • [4] Multi-label classification of legal text based on label embedding and capsule network
    Chen, Zhe
    Li, Shang
    Ye, Lin
    Zhang, Hongli
    [J]. APPLIED INTELLIGENCE, 2023, 53 (06) : 6873 - 6886
  • [5] Multi-label classification of legal text based on label embedding and capsule network
    Zhe Chen
    Shang Li
    Lin Ye
    Hongli Zhang
    [J]. Applied Intelligence, 2023, 53 : 6873 - 6886
  • [6] MULTI-LABEL TEXT CLASSIFICATION WITH A ROBUST LABEL DEPENDENT REPRESENTATION
    Alfaro, Rodrigo
    Allende, Hector
    [J]. 2011 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS (ICIMCS 2011), VOL 3: COMPUTER-AIDED DESIGN, MANUFACTURING AND MANAGEMENT, 2011, : 211 - 214
  • [7] Hierarchical Multi-label Classification of Text with Capsule Networks
    Aly, Rami
    Remus, Steffen
    Biemann, Chris
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 323 - 330
  • [8] Hierarchical Multi-Label Classification of Social Text Streams
    Ren, Zhaochun
    Peetz, Maria-Hendrike
    Liang, Shangsong
    van Dolen, Willemijn
    de Rijke, Maarten
    [J]. SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 213 - 222
  • [9] Hierarchical Transfer Learning for Multi-label Text Classification
    Banerjee, Siddhartha
    Akkaya, Cem
    Perez-Sorrosal, Francisco
    Tsioutsiouliklis, Kostas
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6295 - 6300
  • [10] A NEW INPUT REPRESENTATION FOR MULTI-LABEL TEXT CLASSIFICATION
    Alfaro, Rodrigo
    Allende, Hector
    [J]. 2011 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS (ICIMCS 2011), VOL 3: COMPUTER-AIDED DESIGN, MANUFACTURING AND MANAGEMENT, 2011, : 207 - 210