Keyword-Based Semi-Supervised Text Classification

被引:4
|
作者
Severin, Karl [1 ]
Gokhale, Swapna S. [1 ]
Dagnino, Aldo [2 ]
机构
[1] Univ Connecticut, Comp Sci & Engn, Storrs, CT 06269 USA
[2] ABB, Cary, NC 27511 USA
关键词
D O I
10.1109/COMPSAC.2019.00067
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Industrial organizations generate massive volumes of data during their routine business and production activities. Such data may be structured (numerical or categorical), or it may be unstructured and textual. Both structured and unstructured data contain a wealth of knowledge that can help organizations improve their operations. Organizations find it easy to automatically extract knowledge from structured data. Unstructured data, however, must be mined and interpreted manually which is cumbersome, error-prone and time consuming. This paper focuses on how to automatically analyze unstructured text data to extract important business value. It proposes a semi-supervised natural language (NL) approach to analyze a corpus of documents associated with accounts receivable disputes at a large corporation. The name semi-supervised derives from the philosophy underlying the methodology, where a set of categories and the keywords associated with these categories are defined in consultation with the domain experts. Subsequently, these categories and their associated keywords are supplied as input to the algorithm, which classifies the disputes automatically into these pre-defined categories. The performance of the semi-supervised methodology is very comparable to that of the random forest, which is a supervised learning approach. The paper discusses the benefits of the semi-supervised approach over supervised learning; namely, a considerable reduction in the manual effort to analyze, understand and label training data set, without any noticeable degradation in performance.
引用
收藏
页码:417 / 422
页数:6
相关论文
共 50 条
  • [21] Research of PU Text Semi-Supervised Classification Based on Ontology Feature Extraction
    Luo, Na
    Yuan, Fuyu
    Zuo, WanLi
    He, Fengling
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 835 - +
  • [22] A New SVM Method for Short Text Classification Based on Semi-Supervised Learning
    Yin, Chunyong
    Xiang, Jun
    Zhang, Hui
    Wang, Jin
    Yin, Zhichao
    Kim, Jeong-Uk
    2015 4TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION TECHNOLOGY AND SENSOR APPLICATION (AITS), 2015, : 100 - 103
  • [23] A Novel Semi-supervised Short Text Classification Algorithm Based on Fusion Similarity
    Li, Xiaohong
    Yan, Li
    Qin, Na
    Ran, Hongyan
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 309 - 319
  • [24] A genetic semi-supervised fuzzy clustering approach to text classification
    Liu, H
    Huang, ST
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2003, 2762 : 173 - 180
  • [25] Semi-Supervised Text Classification via Self-Pretraining
    Karisani, Payam
    Karisani, Negin
    WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2021, : 40 - 48
  • [26] TESC: An approach to TExt classification using Semi-supervised Clustering
    Zhang, Wen
    Tang, Xijin
    Yoshida, Taketoshi
    KNOWLEDGE-BASED SYSTEMS, 2015, 75 : 152 - 160
  • [27] Semi-Supervised Text Classification with Balanced Deep Representation Distributions
    Li, Changchun
    Li, Ximing
    Ouyang, Jihong
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5044 - 5053
  • [28] Automatic Bug Triage using Semi-Supervised Text Classification
    Xuan, Jifeng
    Jiang, He
    Ren, Zhilei
    Yan, Jun
    Luo, Zhongxuan
    22ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING & KNOWLEDGE ENGINEERING (SEKE 2010), 2010, : 209 - 214
  • [29] Semi-supervised text classification using positive and unlabeled data
    Yu, Shuang
    Zhou, Xueyuan
    Li, Chunping
    ADVANCES IN INTELLIGENT IT: ACTIVE MEDIA TECHNOLOGY 2006, 2006, 138 : 249 - 254
  • [30] Progressive Class Semantic Matching for Semi-supervised Text Classification
    Xu, Hai-Ming
    Liu, Lingqiao
    Abbasnejad, Ehsan
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3003 - 3013