Keyword-Based Semi-Supervised Text Classification

被引:4
|
作者
Severin, Karl [1 ]
Gokhale, Swapna S. [1 ]
Dagnino, Aldo [2 ]
机构
[1] Univ Connecticut, Comp Sci & Engn, Storrs, CT 06269 USA
[2] ABB, Cary, NC 27511 USA
关键词
D O I
10.1109/COMPSAC.2019.00067
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Industrial organizations generate massive volumes of data during their routine business and production activities. Such data may be structured (numerical or categorical), or it may be unstructured and textual. Both structured and unstructured data contain a wealth of knowledge that can help organizations improve their operations. Organizations find it easy to automatically extract knowledge from structured data. Unstructured data, however, must be mined and interpreted manually which is cumbersome, error-prone and time consuming. This paper focuses on how to automatically analyze unstructured text data to extract important business value. It proposes a semi-supervised natural language (NL) approach to analyze a corpus of documents associated with accounts receivable disputes at a large corporation. The name semi-supervised derives from the philosophy underlying the methodology, where a set of categories and the keywords associated with these categories are defined in consultation with the domain experts. Subsequently, these categories and their associated keywords are supplied as input to the algorithm, which classifies the disputes automatically into these pre-defined categories. The performance of the semi-supervised methodology is very comparable to that of the random forest, which is a supervised learning approach. The paper discusses the benefits of the semi-supervised approach over supervised learning; namely, a considerable reduction in the manual effort to analyze, understand and label training data set, without any noticeable degradation in performance.
引用
收藏
页码:417 / 422
页数:6
相关论文
共 50 条
  • [1] TEXT CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING
    Vo Duy Thanh
    Vo Trung Hung
    Pham Minh Tuan
    Doan Van Ban
    [J]. 2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 232 - 236
  • [2] An Exploration of Semi-supervised Text Classification
    Lien, Henrik
    Biermann, Daniel
    Palumbo, Fabrizio
    Goodwin, Morten
    [J]. ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022, 2022, 1600 : 477 - 488
  • [3] Semi-supervised collaborative text classification
    Jin, Rong
    Wu, Ming
    Sukthankar, Rahul
    [J]. MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 600 - +
  • [4] Graph-based Semi-supervised Learning for Text Classification
    Widmann, Natalie
    Verberne, Suzan
    [J]. ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 59 - 66
  • [5] Text Classification Method Based On Semi-Supervised Transfer Learning
    Yu, Xiaosheng
    Zhang, Hehuan
    Li, Jing
    [J]. 2021 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C 2021), 2021, : 388 - 394
  • [6] A review of semi-supervised learning for text classification
    José Marcio Duarte
    Lilian Berton
    [J]. Artificial Intelligence Review, 2023, 56 : 9401 - 9469
  • [7] Text Classification Using Semi-Supervised Clustering
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    [J]. 2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 197 - 200
  • [8] Variational Autoencoder for Semi-Supervised Text Classification
    Xu, Weidi
    Sun, Haoze
    Deng, Chao
    Tan, Ying
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3358 - 3364
  • [9] A review of semi-supervised learning for text classification
    Duarte, Jose Marcio
    Berton, Lilian
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (09) : 9401 - 9469
  • [10] Semi-Supervised Text Classification With Universum Learning
    Liu, Chien-Liang
    Hsaio, Wen-Hoar
    Lee, Chia-Hoang
    Chang, Tao-Hsing
    Kuo, Tsung-Hsun
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 462 - 473