Creation of lexical resources for a characterisation of multiword expressions in Italian

被引:0
|
作者
Zaninello, Andrea [1 ]
Nissim, Malvina [1 ]
机构
[1] Univ Bologna, Alma Mater Studiorum, Dipartimento Studi Linguistici & Orientali, I-40126 Bologna, Italy
关键词
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
The theoretical characterisation of multiword expressions (MWEs) is tightly connected to their actual occurrences in data and to their representation in lexical resources. We present three lexical resources for Italian MWEs, namely an electronic lexicon, a series of example corpora and a database of MWEs represented around morphosyntactic patterns. These resources are matched against, and created from, a very large web-derived corpus for Italian that spans across registers and domains. We can thus test expressions coded by lexicographers in a dictionary, thereby discarding unattested expressions, revisiting lexicographers's choices on the basis of frequency information, and at the same time creating an example sub-corpus for each entry. We organise MWEs on the basis of the morphosyntactic information obtained from the data in an electronic, flexible knowledge-base containing structured annotation exploitable for multiple purposes. We also suggest further work directions towards characterising MWEs by analysing the data organised in our database through lexico-semantic information available in WordNet or MultiWordNet-like resources, also in the perspective of expanding their set through the extraction of other similar compact expressions.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Alignment-based extraction of multiword expressions
    Helena Medeiros de Caseli
    Carlos Ramisch
    Maria das Graças Volpe Nunes
    Aline Villavicencio
    [J]. Language Resources and Evaluation, 2010, 44 : 59 - 77
  • [42] Accommodating multiword expressions in an Arabic LFG grammar
    Attia, Mohammed A.
    [J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 87 - 98
  • [43] Multiword expressions: hard going or plain sailing?
    Paul Rayson
    Scott Piao
    Serge Sharoff
    Stefan Evert
    Begoña Villada Moirón
    [J]. Language Resources and Evaluation, 2010, 44 : 1 - 5
  • [44] A ROMANIAN CORPUS ANNOTATED WITH VERBAL MULTIWORD EXPRESSIONS
    Mititelu, Verginica Barbu
    Rizea, Monica-Mihaela
    Ionescu, Mihaela
    Onofrei, Mihaela
    Irimia, Elena
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE 'LINQUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE', 2016, : 193 - 195
  • [45] Automatic Identification of Assamese and Bodo Multiword Expressions
    Barman, Anup Kumar
    Sarmah, Jumi
    Sarma, Shikhar Kr.
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 26 - 30
  • [46] DuELME: a Dutch electronic lexicon of multiword expressions
    Gregoire, Nicole
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2010, 44 (1-2) : 23 - 39
  • [47] Modeling Semantic Compositionality of Croatian Multiword Expressions
    Snajder, Jan
    Almic, Petra
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2015, 39 (03): : 301 - 309
  • [48] Dictionary of Bulgarian Multiword Expressions - Advances and Prospects
    Stoyanova, Ivelina
    Todorova, Maria
    Leseva, Svetlozara
    [J]. PROCEEDINGS OF THE INTERNATIONAL JUBILEE CONFERENCE OF THE INSTITUTE FOR BULGARIAN LANGUAGE, VOL 1, 2017, : 311 - 320
  • [49] Evaluating the Impact of External Lexical Resources into a CRF-based Multiword Segmenter and Part-of-Speech Tagger
    Constant, Matthieu
    Tellier, Isabelle
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 646 - 650
  • [50] Analyzing and identifying multiword expressions in spoken language
    Helmer Strik
    Micha Hulsbosch
    Catia Cucchiarini
    [J]. Language Resources and Evaluation, 2010, 44 : 41 - 58