Active Learning for Text Classification: Using the LSI Subspace Signature Model

被引:0
|
作者
Zhu, Weizhong [1 ]
Allen, Robert B. [2 ]
机构
[1] City Hope Med Ctr, Los Angeles, CA USA
[2] Yonsei Univ, Dept Lib & Informat Sci, Seoul, South Korea
关键词
active learning; classifiers; Latent Semantic Indexing Subspace Signature Model; text categorization; REGRESSION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Supervised learning methods rely on large sets of labeled training examples. However, large training sets are rare and making them is expensive. In this research, Latent Semantic Indexing Subspace Signature Model (LSISSM) is applied to labeling for active learning of unstructured text. Based on Singular Value Decomposition (SVD), LSISSM represents terms and documents as semantic signatures by the distribution of their local statistical contribution across the top-ranking LSI latent dimensions after dimension reduction. When utilized to an unlabeled text corpus, LSISSM finds the most important samples and terms according to their global statistical contribution ranking in the corresponding LSI subspaces without prior knowledge of labels or dependency to model-loss functions of the classifiers. These sample subsets also effectively maintain the sampling distribution of the whole corpus. Furthermore, tests demonstrate that the sample subsets with the optimized term subsets substantially improve the learning accuracy across three standard classifiers.
引用
收藏
页码:149 / 155
页数:7
相关论文
共 50 条
  • [1] Document clustering using the LSI subspace signature model
    Zhu, W. Z.
    Allen, R. B.
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2013, 64 (04): : 844 - 860
  • [2] Using LSI and its variants in Text Classification
    Batra, Shalini
    Bawa, Seema
    ADVANCES TECHNIQUES IN COMPUTING SCIENCES AND SOFTWARE ENGINEERING, 2010, : 313 - 316
  • [3] Using Active Learning in Text Classification of Quranic Sciences
    Goudjil, Mohamed
    Bedda, Mouldi
    Koudil, Mouloud
    Ghoggali, Noureddine
    2013 TAIBAH UNIVERSITY INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION TECHNOLOGY FOR THE HOLY QURAN AND ITS SCIENCES, 2013, : 209 - 213
  • [4] Text classification with active learning
    Novak, B
    Mladenic, D
    Grobelnik, M
    FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, : 398 - +
  • [5] A Novel Active Learning Method Using SVM for Text Classification
    Goudjil M.
    Koudil M.
    Bedda M.
    Ghoggali N.
    International Journal of Automation and Computing, 2018, 15 (03) : 290 - 298
  • [6] Active learning for text classification with reusability
    Hu, Rong
    Mac Namee, Brian
    Delany, Sarah Jane
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 45 : 438 - 449
  • [7] Active Learning for Turkish Text Classification
    Sapci, Ali Osman Berk
    Tastan, Oznur
    Yeniterzi, Reyyan
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [8] Deep Active Learning for Text Classification
    An, Bang
    Wu, Wenjun
    Han, Huimin
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING (ICVISP 2018), 2018,
  • [9] Scalable Arabic text Classification Using Machine Learning Model
    Al Mgheed, Rahaf M.
    2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 483 - 485
  • [10] MII: A Novel Text Classification Model Combining Deep Active Learning with BERT
    Zhang, Anman
    Li, Bohan
    Wang, Wenhuan
    Wan, Shuo
    Chen, Weitong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (03): : 1499 - 1514