Using seed words to learn to categorize Chinese text

被引:0
|
作者
Zhu, JB [1 ]
Chen, WL [1 ]
Yao, TS [1 ]
机构
[1] Northeastern Univ, Inst Comp Software & Theory, Nat Language Proc Lab, Shenyang 110004, Peoples R China
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on text categorization model by unsupervised learning techniques that do not require labeled data. We propose a feature learning bootstrapping algorithm (FLB) using a small number of seed words, in that features for each of categories could be automatically learned from a large amount of unlabeled documents. Using these learned features we develop a new Naive Bayes classifier named NB_FLB. Experimental results show that the NB_FLB classifier performs better than other Naive Bayes classifiers by supervised learning in small number of features cases.
引用
收藏
页码:464 / 473
页数:10
相关论文
共 50 条
  • [1] Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery
    Rastegar, Sarah
    Doughty, Hazel
    Snoek, Cees G. M.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Inserting spaces into Chinese text helps readers to learn new words: An eye movement study
    Blythe, Hazel I.
    Liang, Feifei
    Zang, Chuanli
    Wang, Jingxin
    Yan, Guoli
    Bai, Xuejun
    Liversedge, Simon P.
    [J]. JOURNAL OF MEMORY AND LANGUAGE, 2012, 67 (02) : 241 - 254
  • [3] Discovering Chinese words from unsegmented text
    Ge, XP
    Pratt, W
    Smyth, P
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 271 - 272
  • [4] Segmenting unrestricted Chinese text into prosodic words instead of lexical words
    Qian, Y
    Chu, M
    Peng, H
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 825 - 828
  • [5] ON THE USE OF CORRESPONDENCE ANALYSIS TO LEARN SEED ONTOLOGIES FROM TEXT
    Eynard, Davide
    Marfia, Fabio
    Matteucci, Matteo
    [J]. KEOD 2010: Proceedings of the International Conference on Knowledge Engineering and Ontology Development, 2010, : 430 - 439
  • [6] Detecting New Words from Chinese Text Using Latent Semi-CRF Models
    Sun, Xiao
    Huang, Degen
    Ren, Fuji
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (06): : 1386 - 1393
  • [7] A Novel Chinese Text Summarization Approach Using Sentence Extraction Based on Kernel Words Recognition
    Yang, Weijie
    Dai, Ruwei
    Cui, Xia
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2008, : 134 - 139
  • [8] Document Representation Combining Concepts and Words in Chinese Text Categorization
    Che, Chao
    Teng, HongFei
    [J]. IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 540 - 544
  • [9] The research of estimation model for the correlativity between words in Chinese text
    Zhang, YS
    Cao, YD
    Chen, LC
    [J]. ISTM/2003: 5TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-6, CONFERENCE PROCEEDINGS, 2003, : 1174 - 1178
  • [10] Simulated annealing clustering of Chinese words for contextual text recognition
    Chang, CH
    [J]. PATTERN RECOGNITION LETTERS, 1996, 17 (01) : 57 - 66