Text Categorization for Vietnamese Documents

被引:0
|
作者
Nguyen, Giang-Son [1 ]
Gao, Xiaoying [1 ]
Andreae, Peter [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington, New Zealand
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many machine learning methods have been proposed for text categorization, but most research has applied them to English documents. Vietnamese is a different language with different features and it is not clear whether the standard methods will work on the categorization of Vietnamese documents. This paper describes morphological level document representtations that are appropriate for Vietnamese text documents and investigates the effectiveness of several standard learning algorithms including Naive Bayes, K-Nearest Neighbour (ICNN) and Support Vector Machine (SVM) with four different kernel functions. The results show that it is possible to build effective and efficient classifiers for Vietnamese text categorization using our representations and the standard algorithms, and demonstrate that the performance can be improved by using infogain for feature selection and using an external dictionary for filtering the vocabulary.
引用
收藏
页码:466 / 469
页数:4
相关论文
共 50 条
  • [1] Fuzzy clustering and categorization of text documents
    Ayeldeen, Heba
    Rassanien, Aboul Ella
    Fahmy, Aly Aly
    2013 13TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2013, : 262 - 266
  • [2] Automatic Text Categorization Marathi documents
    Patil, Javdeep Jalindar
    Bogiri, Nagaraju
    2015 INTERNATIONAL CONFERENCE ON ENERGY SYSTEMS AND APPLICATIONS, 2015, : 689 - 694
  • [3] A Lexical Approach for Text Categorization of Medical Documents
    Jindal, Rajni
    Taneja, Shweta
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES, ICICT 2014, 2015, 46 : 314 - 320
  • [4] Incorporating virtual relevant documents for learning in text categorization
    Lee, KS
    Kageura, K
    DIGITAL LIBRARIES: TECHNOLOGY AND MANAGEMENT OF INDIGENOUS KNOWLEDGE FOR GLOBAL ACCESS, 2003, 2911 : 62 - 72
  • [5] Text Categorization Study Case: Patents' Application Documents
    Gomes, Neide de Oliveira
    Lopes Passos, Emmanuel Piceses
    2011 6TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2011, : 446 - 450
  • [6] Utilizing Genetic Algorithm and online resources for Vietnamese Text Categorization
    Nguyen, Hung
    Hoang, Kiem
    WMSCI 2005: 9TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 4, 2005, : 188 - 192
  • [7] Text Categorization of Marathi Documents using Modified LINGO
    Narhari, Shraddha A.
    Shedge, Rajashree
    2017 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND CONTROL (ICAC3), 2017,
  • [8] Categorization of text documents taking into account some structural features
    Gulin, V. V.
    Frolov, A. B.
    JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 2016, 55 (01) : 96 - 105
  • [9] Virtual relevant documents in text categorization with support vector machines
    Lee, Kyung-Soon
    Kageura, Kyo
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (04) : 902 - 913
  • [10] Categorization of text documents taking into account some structural features
    V. V. Gulin
    A. B. Frolov
    Journal of Computer and Systems Sciences International, 2016, 55 : 96 - 105