Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders

被引:9
|
作者
Alkhatib, Wael [1 ]
Rensing, Christoph [1 ]
Silberbauer, Johannes [1 ]
机构
[1] Tech Univ Darmstadt, Fachgebiet Multimedia Kommunikat, S3-20,Rundeturmstr 10, D-64283 Darmstadt, Germany
来源
关键词
Semantics; Feature selection; Dimensionality reduction; Text classification; Semantic relations; Autoencoders; FEATURE-SELECTION;
D O I
10.1007/978-3-319-59888-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is of vital concern in text classification to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features suffer from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EUR-lex dataset, showed that semantic-based feature selection techniques significantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TF-IDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.
引用
收藏
页码:380 / 394
页数:15
相关论文
共 50 条
  • [31] Combining Dimensionality Reduction with Random Forests for Multi-label Classification Under Interactivity Constraints
    Nair-Benrekia, Noureddine-Yassine
    Kuntz, Pascale
    Meyer, Frank
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II, 2017, 10235 : 828 - 839
  • [32] ENSEMBLE OF LABEL SPECIFIC FEATURES FOR MULTI-LABEL CLASSIFICATION
    Wei, Xiaoya
    Yu, Ziwei
    Zhang, Changqing
    Hu, Qinghua
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [33] Learning Label Specific Features for Multi-Label Classification
    Huang, Jun
    Li, Guorong
    Huang, Qingming
    Wu, Xindong
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 181 - 190
  • [34] Multi-label text classification based on semantic-sensitive graph convolutional network
    Zeng, Delong
    Zha, Enze
    Kuang, Jiayi
    Shen, Ying
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [35] Multi-Label Text Classification Based on DistilBERT and Label Correlation
    Wang, Xuyang
    Geng, Liuqing
    Zhang, Xin
    Computer Engineering and Applications, 2024, 60 (23) : 168 - 175
  • [36] Multi-label Classification of Legal Text with Fusion of Label Relations
    Song Z.
    Li Y.
    Li D.
    Wang S.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (02): : 185 - 192
  • [37] MULTI-LABEL TEXT CLASSIFICATION WITH A ROBUST LABEL DEPENDENT REPRESENTATION
    Alfaro, Rodrigo
    Allende, Hector
    2011 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS (ICIMCS 2011), VOL 3: COMPUTER-AIDED DESIGN, MANUFACTURING AND MANAGEMENT, 2011, : 211 - 214
  • [38] A Label Information Aware Model for Multi-label Text Classification
    Tian, Xiaoyu
    Qin, Yongbin
    Huang, Ruizhang
    Chen, Yanping
    Neural Processing Letters, 2024, 56 (05)
  • [39] Multi-Label Emotion Classification of Online Learners' Reviews Using Machine Learning Text-Based Multi-Label Classification Approach
    Makhoukhi, Hajar
    Roubi, Sarra
    2024 5TH INTERNATIONAL CONFERENCE ON EDUCATION DEVELOPMENT AND STUDIES, ICEDS 2024, 2024, : 59 - 64
  • [40] Latent Semantic Indexing and Convolutional Neural Network for Multi-Label and Multi-Class Text Classification
    Quispe, Oscar
    Ocsa, Alexander
    Coronado, Ricardo
    2017 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2017,