An analysis of hierarchical text classification using word embeddings

被引:130
|
作者
Stein, Roger Alan [1 ]
Jaques, Patricia A. [1 ]
Valiati, Joao Francisco [2 ]
机构
[1] Univ Vale Rio Sinos UNISINOS, Programa Posgrad Comp Aplicada PPGCA, Av Unisinos 950, Sao Leopoldo, RS, Brazil
[2] AIE, Rua Vieira Castro 262, Porto Alegre, RS, Brazil
关键词
Hierarchical text classification; Word embeddings; Gradient tree boosting; fastText; Support vector machines;
D O I
10.1016/j.ins.2018.09.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient distributed numerical word representation models (word embeddings) combinec with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This stud investigates the application of those models and algorithms on this specific problem b3 means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations-fastText, XGBoost, SVM, and Keras' CNN-and noticeable word embeddings generation methods-GloVe, word2vec, and fastTextwith publicly available data and evaluated them with measures specifically appropriate fot the hierarchical context. FastText achieved an LcAF(1) of 0.893 on a single-labeled version o the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is very promising approach for HTC. (C) 2018 Elsevier Inc. All rights reserved
引用
收藏
页码:216 / 232
页数:17
相关论文
共 50 条
  • [21] Using word embeddings in Twitter election classification
    Yang, Xiao
    Macdonald, Craig
    Ounis, Iadh
    INFORMATION RETRIEVAL JOURNAL, 2018, 21 (2-3): : 183 - 207
  • [22] Multilabeled Emotions Classification in Software Engineering Text Using Convolutional Neural Networks and Word Embeddings
    Wagan, Atif Ali
    Li, Shuaiyong
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2025, 37 (03)
  • [23] Text Data Augmentation Techniques for Word Embeddings in Fake News Classification
    Kapusta, Jozef
    Drzik, David
    Steflovic, Kirsten
    Nagy, Kitti Szabo
    IEEE ACCESS, 2024, 12 : 31538 - 31550
  • [24] MULTIMODAL DEPRESSION CLASSIFICATION USING ARTICULATORY COORDINATION FEATURES AND HIERARCHICAL ATTENTION BASED TEXT EMBEDDINGS
    Seneviratne, Nadee
    Espy-Wilson, Carol
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6252 - 6256
  • [25] Hierarchical Convolutional Attention Networks Using Joint Chinese Word Embedding for Text Classification
    Zhang, Kaiqiang
    Wang, Shupeng
    Li, Binbin
    Mei, Feng
    Zhang, Jianyu
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 234 - 246
  • [26] Training-Less Multi-label Text Classification Using Knowledge Bases and Word Embeddings
    Alkhatib, Wael
    Schnitzer, Steffen
    Rensing, Christoph
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 97 - 104
  • [27] Delta-training: Simple Semi-Supervised Text Classification using Pretrained Word Embeddings
    Jo, Hwiyeol
    Cinarel, Ceyda
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3458 - 3463
  • [28] Word sense disambiguation for exploiting hierarchical thesauri in text classification
    Mavroeidis, D
    Tsatsaronis, G
    Vazirgiannis, M
    Theobald, M
    Weikum, G
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 181 - 192
  • [29] From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings
    Butnaru, Andrei M.
    Ionescu, Radu Tudor
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 1783 - 1792
  • [30] Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings
    Yang, Yi
    Wang, Hongan
    Zhu, Jiaqi
    Wu, Yunkun
    Jiang, Kailong
    Guo, Wenli
    Shi, Wandong
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3969 - 3975