Text Classification with Topic-based Word Embedding and Convolutional Neural Networks

被引:31
|
作者
Xu, Haotian [1 ]
Dong, Ming [1 ]
Zhu, Dongxiao [1 ]
Kotov, Alexander [1 ]
Carcone, April Idalski [2 ]
Naar-King, Sylvie [2 ]
机构
[1] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
[2] Wayne State Univ, Pediat Prevent Res Ctr, Detroit, MI 48201 USA
关键词
text classification; convolutional neural networks; word embeddings; medical subject headings;
D O I
10.1145/2975167.2975176
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recently, distributed word embeddings trained by neural language models are commonly used for text classification with Convolutional Neural Networks (CNNs). In this paper, we propose a novel neural language model, Topic-based Skip-gram, to learn topic-based word embeddings for biomedical literature indexing with CNNs. Topic-based Skip-gram leverages textual content with topic models, e.g., Latent Dirichlet Allocation (LDA), to capture precise topic-based word relationship and then integrate it into distributed word embedding learning. We then describe two multimodal CNN architectures, which are able to employ different kinds of word embeddings at the same time for text classification. Through extensive experiments conducted on several real-world datasets, we demonstrate that combination of our Topic-based Skip-gram and multimodal CNN architectures outperforms state-of-the-art methods in biomedical literature indexing, clinical note annotation and general textual benchmark dataset classification.
引用
收藏
页码:88 / 97
页数:10
相关论文
共 50 条
  • [1] Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya
    Fesseha, Awet
    Xiong, Shengwu
    Emiru, Eshete Derb
    Diallo, Moussa
    Dahou, Abdelghani
    [J]. INFORMATION, 2021, 12 (02) : 1 - 17
  • [2] Convolutional Neural Network with Contextualized Word Embedding for Text Classification
    Fan, Gaoyang
    Zhu, Cui
    Zhu, Wenjun
    [J]. 2019 INTERNATIONAL CONFERENCE ON IMAGE AND VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2019, 11321
  • [3] DeepPatent: patent classification with convolutional neural networks and word embedding
    Li, Shaobo
    Hu, Jie
    Cui, Yuxin
    Hu, Jianjun
    [J]. SCIENTOMETRICS, 2018, 117 (02) : 721 - 744
  • [4] DeepPatent: patent classification with convolutional neural networks and word embedding
    Shaobo Li
    Jie Hu
    Yuxin Cui
    Jianjun Hu
    [J]. Scientometrics, 2018, 117 : 721 - 744
  • [5] Text classification based on word2vec and convolutional neural networks
    Fan, Xiaojing
    Jiang, Mingyang
    Pei, Zhili
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 77 - 78
  • [6] Probabilistic topic modeling for short text based on word embedding networks
    Pita, Marcelo
    Nunes, Matheus
    Pappa, Gisele L.
    [J]. APPLIED INTELLIGENCE, 2022, 52 (15) : 17829 - 17844
  • [7] Probabilistic topic modeling for short text based on word embedding networks
    Marcelo Pita
    Matheus Nunes
    Gisele L. Pappa
    [J]. Applied Intelligence, 2022, 52 : 17829 - 17844
  • [8] Hierarchical Convolutional Attention Networks Using Joint Chinese Word Embedding for Text Classification
    Zhang, Kaiqiang
    Wang, Shupeng
    Li, Binbin
    Mei, Feng
    Zhang, Jianyu
    [J]. PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 234 - 246
  • [9] Dynamic Embedding Projection-Gated Convolutional Neural Networks for Text Classification
    Tan, Zhipeng
    Chen, Jing
    Kang, Qi
    Zhou, MengChu
    Abusorrah, Abdullah
    Sedraoui, Khaled
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (03) : 973 - 982
  • [10] Topic Classification Based on Improved Word Embedding
    Sheng, Liangliang
    Xu, Lizhen
    [J]. 2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017), 2017, : 117 - 121