Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders

被引:9
|
作者
Alkhatib, Wael [1 ]
Rensing, Christoph [1 ]
Silberbauer, Johannes [1 ]
机构
[1] Tech Univ Darmstadt, Fachgebiet Multimedia Kommunikat, S3-20,Rundeturmstr 10, D-64283 Darmstadt, Germany
来源
关键词
Semantics; Feature selection; Dimensionality reduction; Text classification; Semantic relations; Autoencoders; FEATURE-SELECTION;
D O I
10.1007/978-3-319-59888-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is of vital concern in text classification to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features suffer from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EUR-lex dataset, showed that semantic-based feature selection techniques significantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TF-IDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.
引用
收藏
页码:380 / 394
页数:15
相关论文
共 50 条
  • [21] Prototype Selection and Dimensionality Reduction on Multi-Label Data
    Hemavati
    Devi, V. Susheela
    Kuruvilla, Seba Ann
    Aparna, R.
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 195 - 199
  • [22] Metalearning Applied to Multi-label Text Classification
    dos Santos, Vania Batista
    de Campos Merschmann, Luiz Henrique
    PROCEEDINGS OF 16TH BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS ON DIGITAL TRANSFORMATION AND INNOVATION, SBSI 2020, 2020,
  • [23] Semi-Supervised Multi-Label Dimensionality Reduction
    Guo, Baolin
    Hou, Chenping
    Nie, Feiping
    Yi, Dongyun
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 919 - 924
  • [24] Scalable Multi-Label Arabic Text Classification
    Ahmed, Nizar A.
    Shehab, Mohammed A.
    Al-Ayyoub, Mahmoud
    Hmeidi, Ismail
    2015 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2015, : 212 - 217
  • [25] Image to Text Translation by Multi-Label Classification
    Nasierding, Gulisong
    Kouzani, Abbas Z.
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2010, 6216 : 247 - +
  • [26] A Neural Architecture for Multi-label Text Classification
    Coope, Sam
    Bachrach, Yoram
    Zukov-Gregoric, Andrej
    Rodriguez, Jose
    Maksak, Bogdan
    McMurtie, Conan
    Bordbar, Mahyar
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 676 - 691
  • [27] Multi-label Classification of Legislative Text into EuroVoc
    Boella, Guido
    Di Caro, Luigi
    Lesmo, Leonardo
    Daniele, Rispoli
    Robaldo, Livio
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2012), 2012, 250 : 21 - 30
  • [28] Multi-Label Arabic Text Classification: An Overview
    Aljedani, Nawal
    Alotaibi, Reem
    Taileb, Mounira
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 694 - 706
  • [29] Multi-label arabic text classification: an overview
    Aljedani N.
    Alotaibi R.
    Taileb M.
    International Journal of Advanced Computer Science and Applications, 2020, 11 (10): : 694 - 706
  • [30] Learning Semantic Similarity for Multi-label Text Categorization
    Li, Li
    Wang, Mengxiang
    Zhang, Longkai
    Wang, Houfeng
    CHINESE LEXICAL SEMANTICS, 2014, 8922 : 260 - 269