Slovene Multi-word Units: Identification, Categorization, and Representation

被引:0
|
作者
Gantar, Polona [1 ]
Cibej, Jaka [2 ]
Bon, Mija [1 ]
机构
[1] Univ Ljubljana, Fac Arts, Ljubljana, Slovenia
[2] Jozef Stefan Inst, Ljubljana, Slovenia
关键词
Multi-word units; Slovene; Identification; Categorization; Multi-word lexicon; EXPRESSIONS;
D O I
10.1007/978-3-030-30135-4_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present the results of a manual annotation of a Slovene training corpus with multi-word units (MWUs) relevant for inclusion in a lexicon of Slovene MWUs. We analyze the annotations in terms of (a) the frequency with which a string has been identified as a MWU, (b) the degree to which the annotators agree on the category of the identified MWU, and (c) the degree to which the annotators agree on the range of the MWU in terms of its lexicalized elements. The results of the analysis will be useful in different stages of the compilation of a Slovene MWU lexicon. The list of dictionary-relevant MWUs obtained in the annotation task will be used to enrich the lexicon and to train models for the automatic identification of MWUs in running text. The findings will also help revise the criteria for the identification and categorization of dictionary-relevant MWUs in relation to free phrases, as well as more clearly define the distinction between the lexicalized elements of MWUs and the more or less stable elements of their textual environment, which will be useful when determining the canonical forms of MWUs in the lexicon on one hand and their relation to their variable elements and syntactic conversions on the other.
引用
收藏
页码:99 / 112
页数:14
相关论文
共 50 条
  • [1] Chunks, multi-word units et cetera: The role of multi-word units in second language acquisition
    Aguado, Karin
    [J]. DEUTSCH ALS FREMDSPRACHE-ZEITSCHRIFT ZUR THEORIE UND PRAXIS DES FACHES DEUTSCH ALS FREMDSPRACHE, 2024, 61 (01):
  • [2] Phonological similarity in multi-word units
    Gries, Stefan Th.
    [J]. COGNITIVE LINGUISTICS, 2011, 22 (03) : 491 - 510
  • [3] Representation and processing of multi-word expressions in the brain
    Siyanova-Chanturia, Anna
    Conklin, Kathy
    Caffarra, Sendy
    Kaan, Edith
    van Heuven, Walter J. B.
    [J]. BRAIN AND LANGUAGE, 2017, 175 : 111 - 122
  • [4] TFIDF, LSI and Multi-word in Information Retrieval and Text Categorization
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 108 - +
  • [5] Inclusion strategies for multi-word units in monolingual dictionaries
    Louw, Phillip
    [J]. LEXIKOS, 2006, 16 : 95 - 103
  • [6] The role of multi-word units in interactive information retrieval
    Vechtomova, O
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2005, 3408 : 403 - 420
  • [7] Corpus analysis and phraseology: Transfer of multi-word units
    Peromingo, Juan Pedro Rica
    [J]. LINGUISTICS AND THE HUMAN SCIENCES, 2010, 6 (1-3): : 321 - 343
  • [8] "The Song of Words" Teaching Multi-Word Units with Songs
    Tomczak, Ewa
    Lew, Robert
    [J]. 3L-LANGUAGE LINGUISTICS LITERATURE-THE SOUTHEAST ASIAN JOURNAL OF ENGLISH LANGUAGE STUDIES, 2019, 25 (04): : 16 - 33
  • [9] Automatic Construction of a Morphological Dictionary of Multi-Word Units
    Krstev, Cvetana
    Stankovic, Ranka
    Obradovic, Ivan
    Vitas, Dusko
    Utvic, Milos
    [J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, 2010, 6233 : 226 - +
  • [10] MULTI-WORD LEXICAL UNITS IN L2 TEXTBOOKS
    Dolores Lopez-Jimenez, Maria
    [J]. REVISTA ESPANOLA DE LINGUISTICA APLICADA, 2013, 26 : 333 - 348