An account of the challenge of tagging a reference corpus for Brazilian Portuguese

被引:0
|
作者
Aluísio, S
Pelizzoni, J
Marchi, AR
de Oliveira, L
Manenti, R
Marquiafável, V
机构
[1] Univ Sao Paulo, ICMC, DCCE, BR-13560970 Sao Carlos, SP, Brazil
[2] USP, ICMC, NILC, BR-13560970 Sao Carlos, SP, Brazil
来源
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS | 2003年 / 2721卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article identifies and addresses the major linguistic/conceptual, as opposed to logistic, issues faced in the morphosyntactic tagging of MAC-Morpho, a 1.1 million word Brazilian Portuguese corpus of newspaper articles that has been developed in the Lacio-Web Project. Rather than simply presenting the annotated corpus and describing its tagset, we elaborate on the criteria for establishing the tagset and analyze some interesting cases amongst the linguistic problems we faced in this work.
引用
收藏
页码:110 / 117
页数:8
相关论文
共 50 条
  • [41] Error Tagging in the Lithuanian Learner Corpus
    Ruzaite, Jurate
    Dereskeviciute, Sigita
    Kavaliauskaite-Vilkiniene, Viktorija
    Krivickaite-Leisiene, Egle
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE (HLT 2020), 2020, 328 : 253 - 260
  • [42] Keys to pragmatic tagging in the CHILDHUM corpus
    Ortega-Gilabert, Jose Antonio
    Timofeeva-Timofeev, Larissa
    CIRCULO DE LINGUISTICA APLICADA A LA COMUNICACION, 2023, (96): : 59 - 71
  • [43] Construction of Word Sense Tagging Corpus
    Zan, Hongying
    Chen, JunYi
    Cheng, XiaoYu
    Mu, Lingling
    CHINESE LEXICAL SEMANTICS, CLSW 2018, 2018, 11173 : 679 - 690
  • [44] Measurement Properties and Translation to Brazilian-Portuguese of the Challenge for Children and Adolescents with Cerebral Palsy
    Sousa Junior, Ricardo Rodrigues
    Gontijo, Ana Paula Bensemann
    Santos, Thiago Ribeiro Teles
    Wright, F. Virginia
    Mancini, Marisa C.
    PHYSICAL & OCCUPATIONAL THERAPY IN PEDIATRICS, 2021, 41 (04) : 372 - 389
  • [45] Tagging a Hebrew Corpus: The Case of Participles
    Adler, Meni
    Netzer, Yael
    Goldberg, Yoav
    Gabay, David
    Elhadad, Michael
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3167 - 3174
  • [46] Fakepedia Corpus: A Flexible Fake News Corpus in Portuguese
    Charles, Anderson Cordeiro
    Ruback, Livia
    Oliveira, Jonice
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 37 - 45
  • [47] The COPLE2 Corpus: a Learner Corpus for Portuguese
    Mendes, Amalia
    Antunes, Sandra
    Janssen, Maarten
    Goncalves, Anabela
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3207 - 3214
  • [48] BRAZILIAN PORTUGUESE DICTIONARIES
    SILBERGER, KK
    RQ, 1989, 28 (03): : 334 - 339
  • [49] The subject in Brazilian Portuguese
    Kröll, H
    ARCHIV FUR DAS STUDIUM DER NEUEREN SPRACHEN UND LITERATUREN, 1998, 235 (02): : 457 - 458
  • [50] BRAZILIAN PORTUGUESE SUNG
    Carvalho, Flavio
    OPUS, 2006, 12 : 188 - 192