Spectral Clustering based Active Learning with Applications to Text Classification

被引:1
|
作者
Guo, Wenbo [1 ]
Zhong, Chun [1 ]
Yang, Yupu [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China
关键词
D O I
10.1051/matecconf/20165601003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Active learning is a kind of machine learning algorithms that spontaneously choose data samples from which they will learn. It has been widely used in many data mining fields such as text classification, in which large amounts of unlabelled data samples are available, but labels are hard to get. In this paper, an improved active learning algorithm is proposed, which takes advantages of the distribution feature of the datasets to reduce the labelling cost and increase the accuracy. Before the active learning process, spectral clustering algorithm is applied to divide the datasets into two categories, and instances located at the boundary of two categories are labelled to train the initial classifier. In order to reduce the calculation cost, an incremental method is added in the present algorithm. The algorithm is applied to several text classification problems. The results show it is more effective and more accurate than the traditional active learning algorithm.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Combining fuzzy clustering with Naive Bayes augmented learning in text classification
    Liu, Lizhen
    Sun, Xiaojing
    Song, Hantao
    2006 1ST INTERNATIONAL SYMPOSIUM ON PERVASIVE COMPUTING AND APPLICATIONS, PROCEEDINGS, 2006, : 168 - +
  • [42] Multi-Spectral Image Classification Based on an Object-Based Active Learning Approach
    Su, Tengfei
    Zhang, Shengwei
    Liu, Tingxi
    REMOTE SENSING, 2020, 12 (03)
  • [43] Classification of Celestial Spectral Based on Improved Density Clustering
    Deng, Shiyu
    Tu, Liangping
    2017 10TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI), 2017,
  • [44] Spectral clustering based on learning similarity matrix
    Park, Seyoung
    Zhao, Hongyu
    BIOINFORMATICS, 2018, 34 (12) : 2069 - 2076
  • [45] Fuzzy based affinity learning for spectral clustering
    Li, Qilin
    Ren, Yan
    Li, Ling
    Liu, Wanquan
    PATTERN RECOGNITION, 2016, 60 : 531 - 542
  • [46] NSCKL: Normalized Spectral Clustering With Kernel-Based Learning for Semisupervised Hyperspectral Image Classification
    Su, Yuanchao
    Gao, Lianru
    Jiang, Mengying
    Plaza, Antonio
    Sun, Xu
    Zhang, Bing
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (10) : 6649 - 6662
  • [47] Applying active learning to assertion classification of concepts in clinical text
    Chen, Yukun
    Mani, Subramani
    Xu, Hua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2012, 45 (02) : 265 - 272
  • [48] Improving Probabilistic Models In Text Classification Via Active Learning
    Bosley, Mitchell
    Kuzushima, Saki
    Enamorado, Ted
    Shiraito, Yuki
    AMERICAN POLITICAL SCIENCE REVIEW, 2024,
  • [49] Impact of Batch Size on Stopping Active Learning for Text Classification
    Beatty, Garrett
    Kochis, Ethan
    Bloodgood, Michael
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 306 - 307
  • [50] Effective Multi-Label Active Learning for Text Classification
    Yang, Bishan
    Sun, Jian-Tao
    Wang, Tengjiao
    Chen, Zheng
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 917 - 925