Scalable Semi-Supervised Query Classification Using Matrix Sketching

被引:0
|
作者
Kim, Young-Bum [1 ]
Stratos, Karl [2 ]
Sarikaya, Ruhi [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
[2] Columbia Univ, New York, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The enormous scale of unlabeled text available today necessitates scalable schemes for representation learning in natural language processing. For instance, in this paper we are interested in classifying the intent of a user query. While our labeled data is quite limited, we have access to virtually an unlimited amount of unlabeled queries, which could be used to induce useful representations: for instance by principal component analysis (PCA). However, it is prohibitive to even store the data in memory due to its sheer size, let alone apply conventional batch algorithms. In this work, we apply the recently proposed matrix sketching algorithm to entirely obviate the problem with scalability (Liberty, 2013). This algorithm approximates the data within a specified memory bound while preserving the covariance structure necessary for PCA. Using matrix sketching, we significantly improve the user intent classification accuracy by leveraging large amounts of unlabeled queries.
引用
收藏
页码:8 / 13
页数:6
相关论文
共 50 条
  • [1] Semi-supervised classification with active query selection
    Wang, Jiao
    Luo, Siwei
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2006, 4109 : 741 - 746
  • [2] Scalable Semi-Supervised Classification via Neumann Series
    Gong, Chen
    Fu, Keren
    Zhou, Lei
    Yang, Jie
    He, Xiangjian
    [J]. NEURAL PROCESSING LETTERS, 2015, 42 (01) : 187 - 197
  • [3] Scalable Semi-Supervised Classification via Neumann Series
    Chen Gong
    Keren Fu
    Lei Zhou
    Jie Yang
    Xiangjian He
    [J]. Neural Processing Letters, 2015, 42 : 187 - 197
  • [4] Improving automatic query classification via semi-supervised learning
    Beitzel, SM
    Jensen, EC
    Frieder, O
    Lewis, DD
    Chowdhury, A
    Kolcz, A
    [J]. Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 42 - 49
  • [5] Semi-supervised classification using bridging
    Chan, Jason
    Koprinska, Irena
    Poon, Josiah
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2008, 17 (03) : 415 - 431
  • [6] Sub-Graph Regularization for Scalable Semi-supervised Classification
    Zhao, Mingbo
    Zhang, Yhe
    Tang, Xue-Song
    [J]. 2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 1488 - 1491
  • [7] Text Classification Using Semi-Supervised Clustering
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    [J]. 2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 197 - 200
  • [8] Improving Semi-Supervised Classification using Clustering
    Arora, J.
    Tushir, M.
    Kashyap, R.
    [J]. EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2020, 7 (25) : 1 - 9
  • [9] Using semi-supervised learning for question classification
    Tri, Nguyen Thanh
    Le, Nguyen Minh
    Shimazu, Akira
    [J]. COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 31 - +
  • [10] Semi-supervised classification using multiple clusterings
    Yu G.X.
    Feng L.
    Yao G.J.
    Wang J.
    [J]. Wang, J. (kingjun@swu.edu.cn), 1600, Izdatel'stvo Nauka (26): : 681 - 687