Feature reinforcement approach to poly-lingual text categorization

被引:0
|
作者
Wei, Chih-Ping [1 ]
Shi, Huihua [2 ]
Yang, Christopher C. [3 ]
机构
[1] Natl Tsing Hua Univ, Inst Technol Management, Hsinchu, Taiwan
[2] Natl Sun Yat Sen Univ, Dept Informat Management, Kaohsiung 80424, Taiwan
[3] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Sha Tin, Peoples R China
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous amount of textual documents written in different languages are electronically accessible online. Poly-lingual text categorization (PLTC) refers to the automatic learning of a text categorization model(s) from a set of preclassified training documents written in different languages and the subsequent assignment of unclassified poly-lingual documents to predefined categories on the basis of the induced text categorization model(s). Although PLTC can be approached as multiple independent monolingual text categorization problems, this naive approach employs only the training documents of the same language to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. In this study, we propose a feature reinforcement approach to PLTC that takes into account the training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as performance benchmarks, our empirical evaluation results show that the proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora.
引用
收藏
页码:99 / +
页数:2
相关论文
共 50 条
  • [1] Exploiting poly-lingual documents for improving text categorization effectiveness
    Wei, Chih-Ping
    Yang, Chin-Sheng
    Lee, Ching-Hsien
    Shi, Huihua
    Yang, Christopher C.
    DECISION SUPPORT SYSTEMS, 2014, 57 : 64 - 76
  • [2] A Feature-Reinforcement-Based Approach for Supporting Poly-Lingual Category Integration
    Wei, Chih-Ping
    Chen, Chao-Chi
    Cheng, Tsang-Hsiang
    Yang, Christopher C.
    DESIGNING E-BUSINESS SYSTEMS, 2009, 22 : 159 - +
  • [3] REVISITING TERM STUDIES IN MODERN POLY-CULTURAL AND POLY-LINGUAL CONTEXTS: METHODOLOGICAL APPROACH
    Chaika, Oksana
    Savytska, Inna
    Sharmanova, Natalia
    WISDOM, 2021, 19 (03): : 17 - 29
  • [4] A New Approach of Feature Selection for Text Categorization
    CUI Zifeng~1
    2. Department of Computer Science and Engineering
    Wuhan University Journal of Natural Sciences, 2006, (05) : 1335 - 1339
  • [5] A new approach to feature selection for text categorization
    Li, SS
    Zong, CQ
    PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 626 - 630
  • [6] Cross-Lingual Text Categorization
    Bel, N
    Koster, CHA
    Villegas, M
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2003, 2769 : 126 - 139
  • [7] Aggressive Dimensionality Reduction with Reinforcement Local Feature Selection for Text Categorization
    Zheng, Wenbin
    Qian, Yuntao
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2010, 6319 : 365 - 372
  • [8] A WordNet-based approach to feature selection in text categorization
    Zhang, K
    Sun, J
    Wang, B
    INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 475 - 484
  • [9] Multiple concept learning - A novel approach to feature selection in text categorization
    Doan, S
    Horiguchi, S
    SOFT COMPUTING AS TRANSDISCIPLINARY SCIENCE AND TECHNOLOGY, 2005, : 1043 - 1052
  • [10] A probabilistic approach to feature selection for multi-class text categorization
    Wu, Ke
    Lu, Bao-Liang
    Uchiyama, Masao
    Isahara, Hitoshi
    ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 1310 - +