Tagging Icelandic text: A linguistic rule-based approach

被引:17
|
作者
Loftsson, Hrafn [1 ]
机构
[1] Reykjavik Univ, Sch Comp Sci, IS-103 Reykjavik, Iceland
关键词
data-driven tagging; disambiguator; linguistic rule-based tagging; simple voting; unknown word guesser;
D O I
10.1017/S0332586508001820
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The Icelandic language is a morphologically complex language, for which a large tagset has been created. This paper describes the design of a linguistic rule-based system for part-of-speech tagging Icelandic text. The system contains two main components: a disambiguator, IceTagger, and an unknown word guesser, IceMorphy. IceTagger uses a small number of local elimination rules along with a global heuristics component. The heuristics guess the functional roles of the words in a sentence, mark prepositional phrases, and use the acquired knowledge to force feature agreement where appropriate. IceMorphy is used for guessing the tag profile for unknown words and for automatically tilling tag profile gaps in the lexicon. Evaluation shows that IceTagger achieves 91.54% accuracy, a substantial improvement on the highest accuracy, 90.44%, obtained using three state-of-the-art data-driven taggers. Furthermore, the accuracy increases to 92.95% by using IceTagger along with two data-driven taggers in a simple voting scheme. The development time of the tagging system was only seven man-months, which can be considered a short development period for a linguistic rule-based system.
引用
收藏
页码:47 / 72
页数:26
相关论文
共 50 条
  • [1] DETECTING EMOTION CAUSES WITH A LINGUISTIC RULE-BASED APPROACH
    Lee, Sophia Yat Mei
    Chen, Ying
    Huang, Chu-Ren
    Li, Shoushan
    [J]. COMPUTATIONAL INTELLIGENCE, 2013, 29 (03) : 390 - 416
  • [2] A Rule-Based Approach to Implicit Emotion Detection in Text
    Udochukwu, Orizu
    He, YuIan
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2015, 2015, 9103 : 197 - 203
  • [3] A Rule-Based Approach for Tagging Non-Vocalized Arabic Words
    Al-Taani, Ahmad
    Abu Al-Rub, Salah
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2009, 6 (03) : 320 - 328
  • [4] A Rule-Based Approach to Embedding Techniques for Text Document Classification
    Aubaid, Asmaa M.
    Mishra, Alok
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (11):
  • [5] Tagging medical texts: a rule-based experiment
    Ruch, P
    Bouillon, P
    Robert, G
    Baud, R
    Rassinoux, AM
    [J]. MEDICAL INFOBAHN FOR EUROPE, PROCEEDINGS, 2000, 77 : 448 - 455
  • [6] A Rule-Based Approach for Detecting Location Leaks of Short Text Messages
    Hoang-Quoc Nguyen-Son
    Minh-Triet Tran
    Yoshiura, Hiroshi
    Sonehara, Noboru
    Echizen, Isao
    [J]. BUSINESS INFORMATION SYSTEMS WORKSHOPS, BIS 2015, 2015, 228 : 199 - 210
  • [7] TEXT COMPRESSION AS RULE-BASED PATTERN-RECOGNITION - TEXT COMPRESSION USING RULE-BASED ENCODER - COMMENT
    NGUYEN, K
    [J]. ELECTRONICS LETTERS, 1995, 31 (09) : 701 - 702
  • [8] A simple rule-based approach to organization name recognition in Chinese text
    Wang, HF
    Shi, WG
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 769 - 772
  • [9] A Rule-Based Approach to Build a Text-to-Speech System for Romanian
    Buza, Ovidiu
    Toderean, Gavril
    Domokos, Jozsef
    [J]. PROCEEDINGS OF THE 2010 8TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS (COMM), 2010, : 83 - 86
  • [10] The linguistic basis of a rule-based tagger of Czech
    Oliva, K
    Hnátková, M
    Petkevic, V
    Kveton, P
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 3 - 8