Active Learning for Text Classification: Using the LSI Subspace Signature Model

被引:0
|
作者
Zhu, Weizhong [1 ]
Allen, Robert B. [2 ]
机构
[1] City Hope Med Ctr, Los Angeles, CA USA
[2] Yonsei Univ, Dept Lib & Informat Sci, Seoul, South Korea
关键词
active learning; classifiers; Latent Semantic Indexing Subspace Signature Model; text categorization; REGRESSION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Supervised learning methods rely on large sets of labeled training examples. However, large training sets are rare and making them is expensive. In this research, Latent Semantic Indexing Subspace Signature Model (LSISSM) is applied to labeling for active learning of unstructured text. Based on Singular Value Decomposition (SVD), LSISSM represents terms and documents as semantic signatures by the distribution of their local statistical contribution across the top-ranking LSI latent dimensions after dimension reduction. When utilized to an unlabeled text corpus, LSISSM finds the most important samples and terms according to their global statistical contribution ranking in the corresponding LSI subspaces without prior knowledge of labels or dependency to model-loss functions of the classifiers. These sample subsets also effectively maintain the sampling distribution of the whole corpus. Furthermore, tests demonstrate that the sample subsets with the optimized term subsets substantially improve the learning accuracy across three standard classifiers.
引用
收藏
页码:149 / 155
页数:7
相关论文
共 50 条
  • [31] A subspace decision cluster classifier for text classification
    Li, Yan
    Hung, Edward
    Chung, Korris
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 12475 - 12482
  • [32] Text classification based on the word subspace representation
    Erica K. Shimomoto
    François Portet
    Kazuhiro Fukui
    Pattern Analysis and Applications, 2021, 24 : 1075 - 1093
  • [33] Text classification based on the word subspace representation
    Shimomoto, Erica K.
    Portet, Francois
    Fukui, Kazuhiro
    PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1075 - 1093
  • [34] Feature Enhancement Based Text Sentiment Classification using Deep Learning Model
    Janardhana, D. R.
    Vijay, C. P.
    Swamy, G. B. Janardhana
    Ganaraj, K.
    PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [35] Applying active learning to assertion classification of concepts in clinical text
    Chen, Yukun
    Mani, Subramani
    Xu, Hua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2012, 45 (02) : 265 - 272
  • [36] Spectral Clustering based Active Learning with Applications to Text Classification
    Guo, Wenbo
    Zhong, Chun
    Yang, Yupu
    2016 8TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2016), 2016, 56
  • [37] Improving Probabilistic Models In Text Classification Via Active Learning
    Bosley, Mitchell
    Kuzushima, Saki
    Enamorado, Ted
    Shiraito, Yuki
    AMERICAN POLITICAL SCIENCE REVIEW, 2024,
  • [38] Support vector machine active learning with applications to text classification
    Tong, S
    Koller, D
    JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (01) : 45 - 66
  • [39] Impact of Batch Size on Stopping Active Learning for Text Classification
    Beatty, Garrett
    Kochis, Ethan
    Bloodgood, Michael
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 306 - 307
  • [40] Effective Multi-Label Active Learning for Text Classification
    Yang, Bishan
    Sun, Jian-Tao
    Wang, Tengjiao
    Chen, Zheng
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 917 - 925