Boosting Short Text Classification by Solving the OOV Problem

被引:1
|
作者
Gao, Nan [1 ]
Wang, Yongjian [1 ]
Chen, Peng [1 ]
Tang, Jijun [2 ]
机构
[1] Zhejiang Univ Technol, Coll Comp Sci & Technol, Coll Software, Hangzhou 310023, Zhejiang, Peoples R China
[2] Univ South Carolina, Coll Engn & Comp, Dept Comp Sci & Engn, Columbia, SC 29208 USA
基金
中国国家自然科学基金;
关键词
Dual knowledge graph; knowledge enhancement; out of vocabulary problem; short text classification; NETWORKS;
D O I
10.1109/TASLP.2023.3316422
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the field of natural language processing, text classification has received a lot of attention. Compared with long texts, short texts have fewer words and lack contextual semantic information. Existing approaches enrich short text information by linking the external knowledge graph, but they ignore the out-of-vocabulary (OOV) problem during entity linking, especially when dealing with domain-oriented data, which has some rare words or domain-specific nouns. In this article, to alleviate the OOV problem caused by linking the external knowledge graph(KG), we propose a domain knowledge graph and entity complementation strategy to improve the performance of short text classification. Specifically, the external knowledge graph is used to enrich the information of short texts. The self-build domain knowledge graph is used to solve the problem of entities failing to link to the external knowledge graph. Finally, we conduct experiments on various datasets: 1. a labeled Chinese electronic domain dataset; 2. an open-source dataset to test the performance of our algorithm in different data distribution scenarios. The results demonstrate our dual knowledge graph model outperforms the state-of-the-art short text classification methods, especially when the OOV problem is severe.
引用
收藏
页码:4014 / 4024
页数:11
相关论文
共 50 条
  • [1] A Unified Model for Solving the OOV Problem of Chinese Word Segmentation
    Li, Xiaoqing
    Zong, Chengqing
    Su, Keh-Yih
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2015, 14 (03)
  • [2] Boosting SpLSA for Text Classification
    Hurtado, Julio
    Mendoza, Marcelo
    Nanculef, Ricardo
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 142 - 149
  • [3] A practical algorithm for solving the sparseness problem of short text clustering
    Qiang, Jipeng
    Li, Yun
    Yuan, Yunhao
    Liu, Wei
    Wu, Xindong
    [J]. INTELLIGENT DATA ANALYSIS, 2019, 23 (03) : 701 - 716
  • [4] SOLVING OF CLASSIFICATION PROBLEM IN SPATIAL ANALYSIS APPLYING THE TECHNOLOGY OF GRADIENT BOOSTING CATBOOST
    Safarov, Ruslan Z.
    Shomanova, Zhanat K.
    Nossenko, Yuriy G.
    Berdenov, Zharas G.
    Bexeitova, Zhuldyz B.
    Shomanov, Adai S.
    Mansurova, Madina
    [J]. FOLIA GEOGRAPHICA, 2020, 62 (01): : 112 - 126
  • [5] μBoost: An Effective Method for Solving Indic Multilingual Text Classification Problem
    Pathak, Manish
    Jain, Aditya
    [J]. 2022 IEEE EIGHTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2022), 2022, : 96 - 100
  • [6] Boosting for text classification with semantic features
    Bloehdorn, Stephan
    Hotho, Andreas
    [J]. ADVANCES IN WEB MINING AND WEB USAGE ANALYSIS, 2006, 3932 : 149 - 166
  • [7] Boosting for text classification with subject headings
    Yi, Kwan
    Beheshti, Jamshid
    [J]. CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2005, 29 (04): : 502 - 503
  • [8] Boosting based text and non-text region classification
    Xie, Bingqing
    Agam, Gady
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL XVIII, 2011, 7874
  • [9] Distributed boosting algorithm for classification of text documents
    Sarnovsky, Martin
    Vronc, Michal
    [J]. 2014 IEEE 12TH INTERNATIONAL SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI), 2014, : 216 - 219
  • [10] Boosting text segmentation via progressive classification
    Eugenio Cesario
    Francesco Folino
    Antonio Locane
    Giuseppe Manco
    Riccardo Ortale
    [J]. Knowledge and Information Systems, 2008, 15 : 285 - 320