Improving automatic query classification via semi-supervised learning

被引:31
|
作者
Beitzel, SM
Jensen, EC
Frieder, O
Lewis, DD
Chowdhury, A
Kolcz, A
机构
关键词
D O I
10.1109/ICDM.2005.80
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accurate topical classification of user queries allows for increased effectiveness and efficiency in general-purpose web search systems. Such classification becomes critical if the system is to return results not just from a general web collection but from topic-specific back-end databases as well. Maintaining sufficient classification recall is very difficult as web queries are typically short, yielding few features per query. This feature sparseness coupled with the high query volumes typical for a large-scale search service makes manual and supervised learning approaches alone insufficient. We use an application of computational linguistics to develop an approach for mining the vast amount of unlabeled data in web query logs to improve automatic topical web query classification. We show that our approach in combination with manual matching and supervised learning allows its to classify a substantially larger proportion of queries than any single technique. We examine the performance of each approach on a real web query stream and show that our combined method accurately classifies 46% of queries, out performing the recall of best single approach by nearly 20% with a 7% improvement in overall effectiveness.
引用
收藏
页码:42 / 49
页数:8
相关论文
共 50 条
  • [41] Semi-supervised learning for photometric supernova classification
    Richards, Joseph W.
    Homrighausen, Darren
    Freeman, Peter E.
    Schafer, Chad M.
    Poznanski, Dovi
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2012, 419 (02) : 1121 - 1135
  • [42] Multimodal semi-supervised learning for image classification
    Guillaumin, Matthieu
    Verbeek, Jakob
    Schmid, Cordelia
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 902 - 909
  • [43] Extreme semi-supervised learning for multiclass classification
    Chen, Chuangquan
    Gan, Yanfen
    Vong, Chi-Man
    NEUROCOMPUTING, 2020, 376 : 103 - 118
  • [44] TEXT CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING
    Vo Duy Thanh
    Vo Trung Hung
    Pham Minh Tuan
    Doan Van Ban
    2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 232 - 236
  • [45] Using semi-supervised learning for question classification
    Tri, Nguyen Thanh
    Le, Nguyen Minh
    Shimazu, Akira
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 31 - +
  • [46] Semi-Supervised Text Classification With Universum Learning
    Liu, Chien-Liang
    Hsaio, Wen-Hoar
    Lee, Chia-Hoang
    Chang, Tao-Hsing
    Kuo, Tsung-Hsun
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 462 - 473
  • [47] Participatory Learning based Semi-supervised Classification
    Deng, Chao
    Guo, Mao-Zu
    Liu, Yang
    Li, Hai-Feng
    ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 4, PROCEEDINGS, 2008, : 207 - 216
  • [48] Malware Classification Based on Semi-Supervised Learning
    Ding, Yu
    Zhang, XiaoYu
    Li, BinBin
    Xing, Jian
    Qiang, Qian
    Qi, ZiSen
    Guo, MengHan
    Jia, SiYu
    Wang, HaiPing
    SCIENCE OF CYBER SECURITY, SCISEC 2022, 2022, 13580 : 287 - 301
  • [49] News Classification with Semi-Supervised and Active Learning
    Guo C.
    Chao Y.
    Data Analysis and Knowledge Discovery, 2022, 6 (04) : 28 - 38
  • [50] Deep graph learning for semi-supervised classification
    Lin, Guangfeng
    Kang, Xiaobing
    Liao, Kaiyang
    Zhao, Fan
    Chen, Yajun
    PATTERN RECOGNITION, 2021, 118