Spectral Clustering based Active Learning with Applications to Text Classification

被引:1
|
作者
Guo, Wenbo [1 ]
Zhong, Chun [1 ]
Yang, Yupu [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China
关键词
D O I
10.1051/matecconf/20165601003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Active learning is a kind of machine learning algorithms that spontaneously choose data samples from which they will learn. It has been widely used in many data mining fields such as text classification, in which large amounts of unlabelled data samples are available, but labels are hard to get. In this paper, an improved active learning algorithm is proposed, which takes advantages of the distribution feature of the datasets to reduce the labelling cost and increase the accuracy. Before the active learning process, spectral clustering algorithm is applied to divide the datasets into two categories, and instances located at the boundary of two categories are labelled to train the initial classifier. In order to reduce the calculation cost, an incremental method is added in the present algorithm. The algorithm is applied to several text classification problems. The results show it is more effective and more accurate than the traditional active learning algorithm.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Active learning framework with iterative clustering for bioimage classification
    Kutsuna, Natsumaro
    Higaki, Takumi
    Matsunaga, Sachihiro
    Otsuki, Tomoshi
    Yamaguchi, Masayuki
    Fujii, Hirofumi
    Hasezawa, Seiichiro
    NATURE COMMUNICATIONS, 2012, 3
  • [32] Active learning framework with iterative clustering for bioimage classification
    Natsumaro Kutsuna
    Takumi Higaki
    Sachihiro Matsunaga
    Tomoshi Otsuki
    Masayuki Yamaguchi
    Hirofumi Fujii
    Seiichiro Hasezawa
    Nature Communications, 3
  • [33] Deep Learning vs Spectral Clustering into an active clustering with pairwise constraints propagation
    Voiron, Nicolas
    Benoit, Alexandre
    Lambert, Patrick
    Ionescu, Bogdan
    2016 14TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2016,
  • [34] Learning visual codebooks for image classification using spectral clustering
    Yi Hong
    Weiping Zhu
    Soft Computing, 2018, 22 : 6077 - 6086
  • [35] Aerosol classification by application of machine learning spectral clustering algorithm
    Ningombam, Shantikumar S.
    Larson, E. J. L.
    Indira, G.
    Madhavan, B. L.
    Khatri, Pradeep
    ATMOSPHERIC POLLUTION RESEARCH, 2024, 15 (03)
  • [36] Learning visual codebooks for image classification using spectral clustering
    Hong, Yi
    Zhu, Weiping
    SOFT COMPUTING, 2018, 22 (18) : 6077 - 6086
  • [37] Transformer-based active learning for multi-class text annotation and classification
    Afzal, Muhammad
    Hussain, Jamil
    Abbas, Asim
    Hussain, Maqbool
    Attique, Muhammad
    Lee, Sungyoung
    DIGITAL HEALTH, 2024, 10
  • [38] Unlabeled Text Classification Optimization Algorithm Based on Active Self-Paced Learning
    Zheng, Tingyi
    Wang, Li
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 404 - 409
  • [39] Spectral analysis of text collection for similarity-based clustering
    Li, WY
    Ng, WK
    Lim, EP
    20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 833 - 833
  • [40] Spectral analysis of text collection for similarity-based clustering
    Li, WY
    Ng, WK
    Lim, EP
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2004, 3056 : 389 - 393