Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders

被引:9
|
作者
Alkhatib, Wael [1 ]
Rensing, Christoph [1 ]
Silberbauer, Johannes [1 ]
机构
[1] Tech Univ Darmstadt, Fachgebiet Multimedia Kommunikat, S3-20,Rundeturmstr 10, D-64283 Darmstadt, Germany
来源
关键词
Semantics; Feature selection; Dimensionality reduction; Text classification; Semantic relations; Autoencoders; FEATURE-SELECTION;
D O I
10.1007/978-3-319-59888-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is of vital concern in text classification to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features suffer from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EUR-lex dataset, showed that semantic-based feature selection techniques significantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TF-IDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.
引用
收藏
页码:380 / 394
页数:15
相关论文
共 50 条
  • [1] Linear Dimensionality Reduction for Multi-label Classification
    Ji, Shuiwang
    Ye, Jieping
    [J]. 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1077 - 1082
  • [2] A Review on Dimensionality Reduction for Multi-Label Classification
    Siblini, Wissam
    Kuntz, Pascale
    Meyer, Frank
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (03) : 839 - 857
  • [3] Integrating Label Semantic Similarity Scores into Multi-label Text Classification
    Chen, Zihao
    Liu, Yang
    Cheng, Baitai
    Peng, Jing
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 234 - 245
  • [4] Multi-label Text Classification Method Based on Label Semantic Information
    Xiao, Lin
    Chen, Bo-Li
    Huang, Xin
    Liu, Hua-Feng
    Jing, Li-Ping
    Yu, Jian
    [J]. Ruan Jian Xue Bao/Journal of Software, 2020, 31 (04): : 1079 - 1089
  • [5] Multi-label dimensionality reduction and classification with extreme learning machines
    Lin Feng
    Jing Wang
    Shenglan Liu
    Yao Xiao
    [J]. Journal of Systems Engineering and Electronics, 2014, 25 (03) : 502 - 513
  • [6] Multi-label dimensionality reduction and classification with extreme learning machines
    Feng, Lin
    Wang, Jing
    Liu, Shenglan
    Xiao, Yao
    [J]. JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2014, 25 (03) : 502 - 513
  • [7] Multi-label text classification model based on semantic embedding
    Yan Danfeng
    Ke Nan
    Gu Chao
    Cui Jianfei
    Ding Yiqi
    [J]. The Journal of China Universities of Posts and Telecommunications, 2019, 26 (01) : 95 - 104
  • [8] Multi-Label Text Classification Based on Shared Semantic Space
    Sun, Kun
    Qin, Bowen
    Sang, Jitao
    Yu, Jian
    [J]. Computer Engineering and Applications, 2023, 59 (12) : 100 - 105
  • [9] A Nonlinear Label Compression and Transformation Method for Multi-label Classification Using Autoencoders
    Wicker, Joerg
    Tyukin, Andrey
    Kramer, Stefan
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT I, 2016, 9651 : 328 - 340
  • [10] Dimensionality Reduction for Hierarchical Multi-Label Classification: A Systematic Mapping Study
    Vieira, Raimundo Osvaldo
    Borges, Helyane Bronoski
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (01) : 130 - 150