Co-clustering based Classification for Out-of-domain Documents

被引:0
|
作者
Dai, Wenyuan [1 ]
Xue, Gui-Rong [1 ]
Yang, Qiang [2 ]
Yu, Yong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai 200030, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China
关键词
Classification; Co-clustering; Out-of-domain; Kullback-Leibler divergence;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many real world applications, labeled data are in short supply. It often happens that obtaining labeled data in a new domain is expensive and time consuming, while there may be plenty of labeled data from a related but different domain. Traditional machine learning is not able to cope well with learning across different domains. In this paper, we address this problem for a text-mining task, where the labeled data are under one distribution in one domain known as in-domain data, while the unlabeled data are under a related but different domain known as out-of-domain data. Our general goal is to learn from the in-domain and apply the learned knowledge to out-of-domain. We propose a co-clustering based classification (CoCC) algorithm to tackle this problem. Co-clustering is used as a bridge to propagate the class structure and knowledge front the in-domain to the out-of-domain. We present theoretical and empirical analysis to show that our algorithm is able to produce high quality classification results, even when the distributions between the two data are different. The experimental results show that our algorithm greatly improves the classification performance over the traditional learning algorithms.
引用
收藏
页码:210 / +
页数:3
相关论文
共 50 条
  • [1] Constrained Co-Clustering for Textual Documents
    Song, Yangqiu
    Pan, Shimei
    Liu, Shixia
    Wei, Furu
    Zhou, Michelle X.
    Qian, Weihong
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 581 - 586
  • [2] Fuzzy co-clustering of web documents
    William-Chandra, T
    Chen, L
    [J]. 2005 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS, 2005, : 545 - 551
  • [3] Using Wikipedia for Co-clustering Based Cross-domain Text Classification
    Wang, Pu
    Domeniconi, Carlotta
    Hu, Jian
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 1085 - +
  • [4] Fuzzy co-clustering of documents and keywords
    Kurnmamuru, K
    Dhawale, A
    Krishnapuram, R
    [J]. PROCEEDINGS OF THE 12TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1 AND 2, 2003, : 772 - 777
  • [5] A domain-knowledge based reconstruction framework for out-of-domain news title classification
    Yuan, Shi
    Liu, Ningning
    Sun, Bo
    Zhao, Chen
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [6] GAN-BASED OUT-OF-DOMAIN DETECTION USING BOTH IN-DOMAIN AND OUT-OF-DOMAIN SAMPLES
    Liang, Chaojie
    Huang, Peijie
    Lai, Wenbin
    Ruan, Ziheng
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7663 - 7667
  • [7] Co-clustering based classification of multi-view data
    Syed Fawad Hussain
    Mohsin Khan
    Imran Siddiqi
    [J]. Applied Intelligence, 2022, 52 : 14756 - 14772
  • [8] Co-clustering WSDL Documents to Bootstrap Service Discovery
    Liang, Tingting
    Chen, Liang
    Ying, Haochao
    Wu, Jian
    [J]. 2014 IEEE 7TH INTERNATIONAL CONFERENCE ON SERVICE-ORIENTED COMPUTING AND APPLICATIONS (SOCA), 2014, : 215 - 222
  • [9] Co-clustering based classification of multi-view data
    Hussain, Syed Fawad
    Khan, Mohsin
    Siddiqi, Imran
    [J]. APPLIED INTELLIGENCE, 2022, 52 (13) : 14756 - 14772
  • [10] Incorporating dialogue context and topic clustering in out-of-domain detection
    Lane, IR
    Kawahara, T
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1045 - 1048