An account of the challenge of tagging a reference corpus for Brazilian Portuguese

被引:0
|
作者
Aluísio, S
Pelizzoni, J
Marchi, AR
de Oliveira, L
Manenti, R
Marquiafável, V
机构
[1] Univ Sao Paulo, ICMC, DCCE, BR-13560970 Sao Carlos, SP, Brazil
[2] USP, ICMC, NILC, BR-13560970 Sao Carlos, SP, Brazil
来源
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS | 2003年 / 2721卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article identifies and addresses the major linguistic/conceptual, as opposed to logistic, issues faced in the morphosyntactic tagging of MAC-Morpho, a 1.1 million word Brazilian Portuguese corpus of newspaper articles that has been developed in the Lacio-Web Project. Rather than simply presenting the annotated corpus and describing its tagset, we elaborate on the criteria for establishing the tagset and analyze some interesting cases amongst the linguistic problems we faced in this work.
引用
收藏
页码:110 / 117
页数:8
相关论文
共 50 条
  • [21] Guidelines for a Brazilian Portuguese Corpus Composed by Texts Written for Children
    Brangel, Larissa Moreira
    Sartori, Beatriz Nogueira
    da Camara, Margot Luiza Pedron
    CALIGRAMA-REVISTA DE ESTUDOS ROMANICOS, 2024, 29 (01): : 24 - 42
  • [22] Construction of a Corpus of80 Implicit Causality Verbs in Brazilian Portuguese
    Barbalho, Rute da Silva
    Souza de Carvalho, Renata Sabrinne
    Godoy, Mahayana Cristina
    REVISTA DE ESTUDOS DA LINGUAGEM, 2024, 32 (03) : 804 - 823
  • [23] Common Language in Scientific Articles in Brazilian Portuguese: a study based on corpus
    Loguercio, Sandra Dias
    ANTARES-LETRAS E HUMANIDADES, 2020, 12 (25): : 140 - 164
  • [24] CONSTRAINTS ON THE USAGE OF VERBAL NEGATION IN BRAZILIAN PORTUGUESE - EVIDENCE FROM A SPOKEN CORPUS
    Lima e Silva, Luis Filipe
    Mello, Heliana
    REVISTA KANINA, 2016, 40 (01): : 71 - 82
  • [25] CONSTRAINTS ON THE USAGE OF VERBAL NEGATION IN BRAZILIAN PORTUGUESE - EVIDENCE FROM A SPOKEN CORPUS
    Lima e Silva, Luis Filipe
    Mello, Heliana
    REVISTA KANINA, 2016, 40 (04): : 123 - 134
  • [26] For a study of Brazilian Portuguese formation: description, representativeness and potentialities of the CEDOHS colonial corpus
    Cardoso, Lara da Silva
    Novais Carneiro, Zenaide de Oliveira
    de Oliveira Lacerda, Mariana Fagundes
    LABORHISTORICO, 2021, 7 : 330 - 355
  • [27] Restrictions on Long Passives in English and Brazilian Portuguese: A Phase-Based Account
    Sheehan, Michelle
    Cyrino, Sonia
    LINGUISTIC INQUIRY, 2024, 55 (04) : 769 - 803
  • [28] Defining a state-of-the-art POS-tagging environment for Brazilian Portuguese clinical texts
    de Oliveira L.F.A.
    e Oliveira L.E.S.
    Gumiel Y.B.
    Carvalho D.R.
    Moro C.M.C.
    Research on Biomedical Engineering, 2020, 36 (03): : 267 - 276
  • [29] Tagging the Dutch PAROLE corpus
    de Does, J
    van der Kleij, JVDV
    COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS 2001, 2002, (45): : 62 - 76
  • [30] THE MORPHOPHONOLOGICAL EXPRESSION OF VARIABLE NUMBER AGREEMENT IN BRAZILIAN PORTUGUESE: AN ACCOUNT OF PRESCHOOLERS' ACQUISITION AND PRODUCTION
    da Silva Passos Jakubow, Ana Paula
    Sicuro Correa, Leticia M.
    REVISTA DA ANPOLL, 2018, 1 (45) : 47 - 67