Active Learning for Text Classification: Using the LSI Subspace Signature Model

被引:0
|
作者
Zhu, Weizhong [1 ]
Allen, Robert B. [2 ]
机构
[1] City Hope Med Ctr, Los Angeles, CA USA
[2] Yonsei Univ, Dept Lib & Informat Sci, Seoul, South Korea
关键词
active learning; classifiers; Latent Semantic Indexing Subspace Signature Model; text categorization; REGRESSION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Supervised learning methods rely on large sets of labeled training examples. However, large training sets are rare and making them is expensive. In this research, Latent Semantic Indexing Subspace Signature Model (LSISSM) is applied to labeling for active learning of unstructured text. Based on Singular Value Decomposition (SVD), LSISSM represents terms and documents as semantic signatures by the distribution of their local statistical contribution across the top-ranking LSI latent dimensions after dimension reduction. When utilized to an unlabeled text corpus, LSISSM finds the most important samples and terms according to their global statistical contribution ranking in the corresponding LSI subspaces without prior knowledge of labels or dependency to model-loss functions of the classifiers. These sample subsets also effectively maintain the sampling distribution of the whole corpus. Furthermore, tests demonstrate that the sample subsets with the optimized term subsets substantially improve the learning accuracy across three standard classifiers.
引用
收藏
页码:149 / 155
页数:7
相关论文
共 50 条
  • [21] Barrage Text Classification with Improved Active Learning and CNN
    Qiu, Ningjia
    Cong, Lin
    Zhou, Sicheng
    Wang, Peng
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2019, 23 (06) : 980 - 989
  • [22] Active Subspace Learning
    He, Xiaofei
    Cai, Deng
    2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 911 - 916
  • [23] SISC: A Text Classification Approach Using Semi Supervised Subspace Clustering
    Ahmed, Mohammad Salim
    Khan, Latifur
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 1 - 6
  • [24] Business Email Classification Using Incremental Subspace Learning
    Li, Min
    Park, Youngja
    Ma, Rui
    Huang, He Yuan
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 625 - 628
  • [25] Semantic Representation in Text Classification Using Topic Signature Mapping
    Achananuparp, Palakorn
    Zhou, Xiaohua
    Hu, Xiaohua
    Zhang, Xiaodan
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1034 - 1040
  • [26] mSRFR: a machine learning model using microalgal signature features for ncRNA classification
    Songtham Anuntakarun
    Supatcha Lertampaiporn
    Teeraphan Laomettachit
    Warin Wattanapornprom
    Marasri Ruengjitchatchawalya
    BioData Mining, 15
  • [27] mSRFR: a machine learning model using microalgal signature features for ncRNA classification
    Anuntakarun, Songtham
    Lertampaiporn, Supatcha
    Laomettachit, Teeraphan
    Wattanapornprom, Warin
    Ruengjitchatchawalya, Marasri
    BIODATA MINING, 2022, 15 (01)
  • [28] A Hybrid Deep Learning Model for Text Classification
    Chen, Xianglong
    Ouyang, Chunping
    Liu, Yongbin
    Luo, Lingyun
    Yang, Xiaohua
    2018 14TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2018, : 46 - 52
  • [29] A HMM Text Classification Model with Learning Capacity
    Seara Vieira, A.
    Iglesias, E. L.
    Borrajo, L.
    Romero, R.
    ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2014, 3 (03): : 21 - 33
  • [30] Feature selection in text classification via SVM and LSI
    Wang, Ziqiang
    Zhang, Dexian
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1381 - 1386