Using seed words to learn to categorize Chinese text

被引:0
|
作者
Zhu, JB [1 ]
Chen, WL [1 ]
Yao, TS [1 ]
机构
[1] Northeastern Univ, Inst Comp Software & Theory, Nat Language Proc Lab, Shenyang 110004, Peoples R China
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on text categorization model by unsupervised learning techniques that do not require labeled data. We propose a feature learning bootstrapping algorithm (FLB) using a small number of seed words, in that features for each of categories could be automatically learned from a large amount of unlabeled documents. Using these learned features we develop a new Naive Bayes classifier named NB_FLB. Experimental results show that the NB_FLB classifier performs better than other Naive Bayes classifiers by supervised learning in small number of features cases.
引用
收藏
页码:464 / 473
页数:10
相关论文
共 50 条
  • [31] Automatic Summarization for Chinese Text Based on Combined Words Recognition and Paragraph Clustering
    Jiang Chang-jin
    Peng Hong
    Ma Qian-li
    Chen Jian-chao
    [J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 591 - 594
  • [32] CESS-A System to Categorize Bangla Web Text Documents
    Dhar, Ankita
    Mukherjee, Himadri
    Dash, Niladri Sekhar
    Roy, Kaushik
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (05)
  • [33] How to Learn English Words
    郝昌明
    [J]. 语数外学习(初中版中旬), 2008, (06) : 40 - 41
  • [34] Reduplicated Words Are Easier to Learn
    Ota, Mitsuhiko
    Skarabela, Barbora
    [J]. LANGUAGE LEARNING AND DEVELOPMENT, 2016, 12 (04) : 380 - 397
  • [35] HOW CHILDREN LEARN WORDS
    MILLER, GA
    GILDEA, PM
    [J]. SCIENTIFIC AMERICAN, 1987, 257 (03) : 94 - 99
  • [36] Ambiguous words are harder to learn
    Degani, Tamar
    Tokowicz, Natasha
    [J]. BILINGUALISM-LANGUAGE AND COGNITION, 2010, 13 (03) : 299 - 314
  • [37] Chinese information retrieval: using characters or words?
    Nie, JY
    Ren, F
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1999, 35 (04) : 443 - 462
  • [38] The Words of text editing
    Morato, Nicola
    [J]. FRENCH STUDIES, 2016, 70 (02) : 304 - 304
  • [39] The words and the (written) text
    Lima Santos, Genesson Johnny
    [J]. ENTREPALAVRAS, 2012, 2 (01): : 326 - 338
  • [40] Design and development of an e-learning tool for children to learn how to write the Chinese words
    Liu, Yan
    Liu, Ketao
    Owen, G. Scott
    Sunderraman, Rajshekhar
    [J]. INNOVATIONS IN E-LEARNING, INSTRUCTION TECHNOLOGY, ASSESSMENT, AND ENGINEERING EDUCATION, 2007, : 401 - +