Improving text categorization using domain knowledge

被引:0
|
作者
Zhu, JB [1 ]
Chen, WL [1 ]
机构
[1] Northeastern Univ, Nat Language Proc Lab, Inst Comp Software & Theory, Shenyang 110004, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we mainly study and propose an approach to improve document classification using domain knowledge. First we introduce a domain knowledge dictionary NEUKD, and propose two models which use domain knowledge as textual features for text categorization. The first one is BOTW model which uses domain associated terms and conventional words as textual features. The other one is BOF model which uses domain features as textual features. But due to limitation of size of domain knowledge dictionary, we study and use a machine learning technique to solve the problem, and propose a BOL model which could be considered as the extended version of BOF model. In the comparison experiments, we consider naive Bayes system based on BOW model as baseline system. Comparison experimental results of naive Bayes systems based on those four models (BOW, BOTW, BOF and BOL) show that domain knowledge is very useful for improving text categorization. BOTW model performs better than BOW model, and BOL and BOF models perform better than BOW model in small number of features cases. Through learning new features using machine learning technique, BOL model performs better than BOF model.
引用
收藏
页码:103 / 113
页数:11
相关论文
共 50 条
  • [1] Improving Text Categorization with Semantic Knowledge in Wikipedia
    Wang, Xiang
    Jia, Yan
    Chen, Ruhua
    Fan, Hua
    Zhou, Bin
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (12) : 2786 - 2794
  • [2] Improving text categorization using the importance of sentences
    Ko, Y
    Park, J
    Seo, J
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (01) : 65 - 79
  • [3] Improving the Performance of Text Categorization using Automatic Summarization
    Jiang Xiao-Yu
    Fan Xiao-Zhong
    Wang Zhi-Fei
    Jia Ke-Liang
    [J]. 2009 INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION, PROCEEDINGS, 2009, : 347 - +
  • [4] Improving Arabic Text Categorization using Decision Trees
    Harrag, Fouzi
    El-Qawasmeh, Eyas
    Pichappan, Pit
    [J]. NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 110 - +
  • [5] Using bigrams detection for text categorization in scientific domain
    Montejo Raez, Arturo
    Perea Ortega, Jose Manuel
    Martin Valdivia, Maria Teresa
    Urena Lopez, L. Alfonso
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (44): : 91 - 98
  • [6] Feature Generation for Text Categorization Using World Knowledge
    Gabrilovich, Evgeniy
    Markovitch, Shaul
    [J]. 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1048 - 1053
  • [7] Improving text categorization using the importance of words in different categories
    Deng, ZH
    Zhang, M
    [J]. COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, 2005, 3801 : 458 - 463
  • [8] Exploiting Domain Knowledge via Grouped Weight Sharing with Application to Text Categorization
    Zhang, Ye
    Lease, Matthew
    Wallace, Byron C.
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 155 - 160
  • [9] Weakly Supervised Short Text Categorization Using World Knowledge
    Tuerker, Rima
    Zhang, Lei
    Alam, Mehwish
    Sack, Harald
    [J]. SEMANTIC WEB - ISWC 2020, PT I, 2020, 12506 : 584 - 600
  • [10] Text categorization based on domain ontology
    He, QM
    Qiu, L
    Zhao, GT
    Wang, SK
    [J]. WEB INFORMATION SYSTEMS - WISE 2004, PROCEEDINGS, 2004, 3306 : 319 - 324