Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders

被引:9
|
作者
Alkhatib, Wael [1 ]
Rensing, Christoph [1 ]
Silberbauer, Johannes [1 ]
机构
[1] Tech Univ Darmstadt, Fachgebiet Multimedia Kommunikat, S3-20,Rundeturmstr 10, D-64283 Darmstadt, Germany
来源
关键词
Semantics; Feature selection; Dimensionality reduction; Text classification; Semantic relations; Autoencoders; FEATURE-SELECTION;
D O I
10.1007/978-3-319-59888-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is of vital concern in text classification to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features suffer from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EUR-lex dataset, showed that semantic-based feature selection techniques significantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TF-IDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.
引用
收藏
页码:380 / 394
页数:15
相关论文
共 50 条
  • [41] Learning Video Features for Multi-label Classification
    Garg, Shivam
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 325 - 337
  • [42] Research on Multi-Classification and Multi-Label in Text Categorization
    Hua, Liu
    2009 INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS, VOL 2, PROCEEDINGS, 2009, : 86 - 89
  • [43] Noisy multi-label semi-supervised dimensionality reduction
    Mikalsen, Karl Oyvind
    Soguero-Ruiz, Cristina
    Bianchi, Filippo Maria
    Jenssen, Robert
    PATTERN RECOGNITION, 2019, 90 : 257 - 270
  • [44] An Improved ML-kNN Multi-label Classification Model Based on Feature Dimensionality Reduction
    Li, Zhi-qiang
    Cao, Shuai-yi
    Guo, Hong-chen
    INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,
  • [45] Multi-label text classification with an ensemble feature space
    Tandon, Kushagri
    Chatterjee, Niladri
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4425 - 4436
  • [46] Multi-label Classification with Clustering for Image and Text Categorization
    Nasierding, Gulisong
    Sajjanhar, Atul
    2013 6TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), VOLS 1-3, 2013, : 869 - 874
  • [47] Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms
    Bromuri, Stefano
    Zufferey, Damien
    Hennebert, Jean
    Schumacher, Michael
    JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 51 : 165 - 175
  • [48] A Combined Approach for Multi-Label Text Data Classification
    Strimaitis, Rokas
    Stefanovic, Pavel
    Ramanauskaite, Simona
    Slotkiene, Asta
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [49] A novel reasoning mechanism for multi-label text classification
    Wang, Ran
    Ridley, Robert
    Su, Xi'ao
    Qu, Weiguang
    Dai, Xinyu
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (02)
  • [50] Academic Resource Text Hierarchical Multi-Label Classification
    Wang, Yue
    Li, Yawen
    Li, Ang
    Computer Engineering and Applications, 2023, 59 (13): : 92 - 98