Automatic generation of text categorization rules in a hybrid method based on machine learning

被引:0
|
作者
Lana-Serrano, Sara [1 ]
Villena-Roman, Julio [2 ]
Collada-Perez, Sonia [3 ]
Carlos Gonzalez-Cristobal, Jose [4 ]
机构
[1] Univ Politecn Madrid, Crta Valencia Km 7, E-28031 Madrid, Spain
[2] Univ Carlos III Madrid, E-28911 Leganes, Spain
[3] DAEDALUS, E-28031 Madrid, Spain
[4] Univ Politecn Madrid, E-28040 Madrid, Spain
来源
关键词
Text categorization; machine learning; rule-based system; kNN; Reuters-21578; mutual information; automatic rule generation; evaluation;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper discusses several techniques for the automatic generation of rules to be used in a novel hybrid method for text categorization. This approach combines a machine learning algorithm along with a different rule-based expert systems in cascade used to filter and re-rank the output of the base model provided by the previous classifier. This paper describes an implementation based on kNN algorithm and a basic rule language that expresses lists of terms appearing in the text. The popular Reuters-21578 news corpus is used for testing. Results show that the proposed methods for automatic rule generation achieve precision values that are very similar to the ones achieved by manually defined rule sets, and that this hybrid approach achieves a precision that is comparable to other top state-of-the-art methods.
引用
收藏
页码:231 / 237
页数:7
相关论文
共 50 条
  • [1] Hybrid approach for text categorization based on machine learning and rules
    Villena-Roman, Julio
    Collada-Perez, Sonia
    Lana-Serrano, Sara
    Carlos Gonzalez-Cristobal, Jose
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (46): : 35 - 42
  • [2] Learning Rules with Negation for Text Categorization
    Rullo, Pasquale
    Cumbo, Chiara
    Policicchio, Veronica L.
    [J]. APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 409 - +
  • [3] Machine learning method for text categorization, based on modelling of classifier's logic
    Ageev, MS
    Dobrov, BV
    Makarov-Zemlyanskii, NV
    [J]. DIGITAL LIBRARIES: ADVANCED METHODS AND TECHNOLOGIES, DIGITAL COLLECTIONS, 2003, : 150 - 158
  • [4] Text categorization based on regularization extreme learning machine
    Wenbin Zheng
    Yuntao Qian
    Huijuan Lu
    [J]. Neural Computing and Applications, 2013, 22 : 447 - 456
  • [5] Text categorization based on regularization extreme learning machine
    Zheng, Wenbin
    Qian, Yuntao
    Lu, Huijuan
    [J]. NEURAL COMPUTING & APPLICATIONS, 2013, 22 (3-4): : 447 - 456
  • [6] Automatic text categorization with learning logic
    Al-Mubaid, H
    Siddiqui, MS
    [J]. COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2003, : 178 - 183
  • [7] Automatic text summary generation method based on hybrid model DNM
    Xu, Kuan
    Liu, Bo
    Li, Jianqiang
    Li, Yong
    Hl, Chen
    Qu, Guangzhi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 637 - 642
  • [8] AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION
    APTE, C
    DAMERAU, F
    WEISS, SM
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) : 233 - 251
  • [9] Machine learning in automated text categorization
    Sebastiani, F
    [J]. ACM COMPUTING SURVEYS, 2002, 34 (01) : 1 - 47
  • [10] Machine learning for Arabic text categorization
    Duwairi, Rehab M.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (08): : 1005 - 1010