Vietnamese Part of Speech Tagging Based on Multi-category Words Disambiguation Model

被引:0
|
作者
Zhao Chen [1 ]
Liu Yanchao [1 ]
Guo Jianyi [1 ,2 ]
Chen Wei [1 ,2 ]
Yan Xin [1 ,2 ]
Yu Zhengtao [1 ,2 ]
Chen Xiuqin [3 ]
机构
[1] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming 650500, Yunnan, Peoples R China
[2] Kunming Univ Sci & Technol, Key Lab Intelligent Informat Proc, Kunming 650500, Yunnan, Peoples R China
[3] Kunming Univ Sci & Technol Int Educ, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-category words disambiguation; Vietnamese; Part of Speech dictionary; POS tagging;
D O I
10.1007/978-3-319-73618-1_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
POS tagging is a fundamental work in Natural Language Processing, which determines the subsequent processing quality, and the ambiguity of multi-category words directly affects the accuracy of Vietnamese POS tagging. At present, the POS tagging of English and Chinese has achieved better results, but the accuracy of Vietnamese POS tagging is still to be improved. For address this problem, this paper proposes a novel method of Vietnamese POS tagging based on multi-category words disambiguation model and Part of Speech dictionary, the multi-category words dictionary and the non-multi-category words dictionary are generated from the Vietnamese dictionary, which are used to build POS tagging corpus. 396,946 multi-category words have been extracted from the corpus, by using statistical method, the maximum entropy disambiguation model of Vietnamese part of speech is constructed, based on it, the multi-category words and the non-multi-category words are tagged. Experimental results show that the method proposed in the paper is higher than the existing model, which is proved that the method is feasible and effective.
引用
收藏
页码:267 / 277
页数:11
相关论文
共 50 条
  • [1] Research on Modern Chinese Multi-category Words Part of Speech Tagging Based on Hidden Markov Model
    Song, Zhendong
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON MECHATRONICS, ELECTRONIC, INDUSTRIAL AND CONTROL ENGINEERING, 2014, 5 : 393 - 397
  • [2] A TENGRAM method based part-of-speech tagging of multi-category words in Hindi language
    Gupta, J. P.
    Tayal, Devendra K.
    Gupta, Arti
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (12) : 15084 - 15093
  • [3] Application of Cloud Desktop in Modern Chinese Multi-category Words Part of Speech Tagging
    Song, Zhendong
    13TH GLOBAL CONGRESS ON MANUFACTURING AND MANAGEMENT, 2017, 174 : 1215 - 1220
  • [4] Application of Big Data and Intelligent Processing Technology in Modern Chinese Multi-category Words Part of Speech Tagging Corpus
    Song, Zhendong
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND SYSTEM (ICISS 2018), 2018, : 107 - 111
  • [5] Automatically acquiring part of speech correcting rules of multi-category words based on incomplete decision tables
    Wang, SG
    Yang, JL
    Li, DY
    Zhang, W
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 68 - 72
  • [6] A utility model for multi-category baskets
    Nadarajah, Saralees
    IMA JOURNAL OF MANAGEMENT MATHEMATICS, 2008, 19 (03) : 269 - 274
  • [7] SOM of Syntactic and Semantic Features Based on Chinese Sentences with Multi-Category Words
    Shi, Yan
    Wang, Lin
    Liu, Rui
    Jiang, Minghu
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 1713 - +
  • [8] Dual Decomposition for Vietnamese Part-of-Speech Tagging
    Bach, Ngo Xuan
    Hiraishi, Kunihiko
    Le Minh, Nguyen
    Shimazu, Akira
    17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 : 123 - 131
  • [9] Evidence for a tagging model of human lexical category disambiguation.
    Corley, S
    Crocker, MW
    PROCEEDINGS OF THE EIGHTEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1996, : 272 - 277
  • [10] Part-of-speech tagging using word probability based on category patterns
    Kang, Mi-young
    Jung, Sung-won
    Park, Kyung-soon
    Kwon, Hyuk-chul
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 119 - +