A novel semi supervised approach for text classification

被引:3
|
作者
Barman D. [1 ]
Chowdhury N. [2 ]
机构
[1] Department of Computer and System Sciences, Visva-Bharati, Santiniketan
[2] Department of Computer Science and Engineering, Jadavpur University, Kolkata
关键词
Decision tree; Kohonen self organizing map; Naïve Bayes; Semi supervised learning; Support vector machine; Text categorization;
D O I
10.1007/s41870-018-0137-9
中图分类号
学科分类号
摘要
Text categorization, also known as text classification is a supervised classification problem. It aims to assign a predefined class label or group to a new or unknown text document. Most of the time we need a collection of large data from each class to train the classifier. It may be noted that, it is very hard or expensive to collect labelled text data. In most cases we assign the label manually which is neither cost effective nor efficient. In this paper, we have introduced a semi-supervised classification approach where the learner needs very small amount of labelled data with a large amount of unlabeled data to assign a class label to a new or unknown text document. The proposed method uses Kohonen self organizing map (SOM) for labelling the unlabeled data and three classifiers namely support vector machine (SVM), Naïve Bayes (NB), and decision tree (DT): classification and regression tree (CART) for observing the accuracy of classification. The experimental results obtained show the effectiveness of our proposed method. © 2018, Bharati Vidyapeeth's Institute of Computer Applications and Management.
引用
收藏
页码:1147 / 1157
页数:10
相关论文
共 50 条
  • [1] SISC: A Text Classification Approach Using Semi Supervised Subspace Clustering
    Ahmed, Mohammad Salim
    Khan, Latifur
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 1 - 6
  • [2] A genetic semi-supervised fuzzy clustering approach to text classification
    Liu, H
    Huang, ST
    [J]. ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2003, 2762 : 173 - 180
  • [3] TESC: An approach to TExt classification using Semi-supervised Clustering
    Zhang, Wen
    Tang, Xijin
    Yoshida, Taketoshi
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 75 : 152 - 160
  • [4] Text Classification using Semi-supervised Approach for Multi Domain
    Deshmukh, Jyoti S.
    Tripathy, Amiya Kumar
    [J]. 2017 INTERNATIONAL CONFERENCE ON NASCENT TECHNOLOGIES IN ENGINEERING (ICNTE-2017), 2017,
  • [5] Semi-supervised collaborative text classification
    Jin, Rong
    Wu, Ming
    Sukthankar, Rahul
    [J]. MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 600 - +
  • [6] An Exploration of Semi-supervised Text Classification
    Lien, Henrik
    Biermann, Daniel
    Palumbo, Fabrizio
    Goodwin, Morten
    [J]. ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022, 2022, 1600 : 477 - 488
  • [7] A Novel Semi-supervised Approach for Protein Sequence Classification
    Chaturvedi, Bharti
    Patil, Nagamma
    [J]. 2015 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2015, : 1158 - 1162
  • [8] A Novel Approach for Semi-Supervised Network Traffic Classification
    Huo, Yonghua
    Song, Chunxiao
    Zhou, Meichao
    Lv, Rui
    Yang, Yang
    [J]. 2022 IEEE 14TH INTERNATIONAL CONFERENCE ON ADVANCED INFOCOMM TECHNOLOGY (ICAIT 2022), 2022, : 64 - 69
  • [9] A review of semi-supervised learning for text classification
    José Marcio Duarte
    Lilian Berton
    [J]. Artificial Intelligence Review, 2023, 56 : 9401 - 9469
  • [10] Text Classification Using Semi-Supervised Clustering
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    [J]. 2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 197 - 200