Discovering Continuous Multi-word Expressions in Czech

被引:0
|
作者
Neverilova, Zuzana [1 ]
机构
[1] Masaryk Univ, Fac Informat, Brno, Czech Republic
来源
COMPUTACION Y SISTEMAS | 2018年 / 22卷 / 03期
关键词
Multiword expression; multi-word expression; MWE; MWE discovery; inter-lingual homographs;
D O I
10.13053/CyS-22-3-3022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-word expressions frequently cause incorrect annotations in corpora, since they often contain foreign words or syntactic anomalies. In case of foreign material, the annotation quality depends on whether the correct language of the sequence is detected. In case of inter-lingual homographs, this problem becomes difficult. In the previous work, we created a dataset of Czech continuous multi-word expressions (MWEs). The candidates were discovered automatically from Czech web corpus considering their orthographic variability. The candidates were classified and annotated manually. Afterwards, the dataset was extended automatically by generating all word forms of those MWEs that were annotated as nouns. In this work, we used the dataset as positive examples, we filtered out negative examples from the MWE candidates. We trained a classifier with mean accuracy 92.7%. We have shown that the combined approach slightly outperforms approaches concerning only association measures mainly on MWEs containing inter-lingual homographs and out-of-vocabulary words. The discovery methods can be applied to other languages which encounter orthographic variability in web corpora.
引用
收藏
页码:845 / 852
页数:8
相关论文
共 50 条
  • [1] Annotation of Multi-Word Expressions in Czech Texts
    Neverilova, Zuzana
    [J]. RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING (RASLAN 2015), 2015, : 103 - 112
  • [2] Verbal Multi-Word Expressions in Yiddish
    Liebeskind, Chaya
    HaCohen-Kerner, Yaakov
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 : 205 - 216
  • [3] The variability of multi-word verbal expressions in Estonian
    Muischnek, Kadri
    Kaalep, Heiki-Jaan
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2010, 44 (1-2) : 115 - 135
  • [4] A framework for the inclusion of multi-word expressions in ELT
    Martinez, Ron
    [J]. ELT JOURNAL, 2013, 67 (02) : 184 - 198
  • [5] The variability of multi-word verbal expressions in Estonian
    Kadri Muischnek
    Heiki-Jaan Kaalep
    [J]. Language Resources and Evaluation, 2010, 44 : 115 - 135
  • [6] Representation and processing of multi-word expressions in the brain
    Siyanova-Chanturia, Anna
    Conklin, Kathy
    Caffarra, Sendy
    Kaan, Edith
    van Heuven, Walter J. B.
    [J]. BRAIN AND LANGUAGE, 2017, 175 : 111 - 122
  • [7] Researching the teaching and learning of multi-word expressions
    Siyanova-Chanturia, Anna
    [J]. LANGUAGE TEACHING RESEARCH, 2017, 21 (03) : 289 - 297
  • [8] Harvesting Multi-Word Expressions from Parallel Corpora
    Vintar, Spela
    Fiser, Darja
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1091 - 1096
  • [9] Constraint Based Description of Polish Multi-word Expressions
    Kurc, Roman
    Piasecki, Maciej
    Broda, Bartosz
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2408 - 2413
  • [10] LOOKing for multi-word expressions in American Sign Language
    Hou, Lynn
    [J]. COGNITIVE LINGUISTICS, 2022, 33 (02) : 291 - 337