An account of the challenge of tagging a reference corpus for Brazilian Portuguese

被引:0
|
作者
Aluísio, S
Pelizzoni, J
Marchi, AR
de Oliveira, L
Manenti, R
Marquiafável, V
机构
[1] Univ Sao Paulo, ICMC, DCCE, BR-13560970 Sao Carlos, SP, Brazil
[2] USP, ICMC, NILC, BR-13560970 Sao Carlos, SP, Brazil
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article identifies and addresses the major linguistic/conceptual, as opposed to logistic, issues faced in the morphosyntactic tagging of MAC-Morpho, a 1.1 million word Brazilian Portuguese corpus of newspaper articles that has been developed in the Lacio-Web Project. Rather than simply presenting the annotated corpus and describing its tagset, we elaborate on the criteria for establishing the tagset and analyze some interesting cases amongst the linguistic problems we faced in this work.
引用
收藏
页码:110 / 117
页数:8
相关论文
共 50 条
  • [1] TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
    Casanova, Edresson
    Junior, Arnaldo Candido
    Shulby, Christopher
    de Oliveira, Frederico Santos
    Teixeira, Joao Paulo
    Ponti, Moacir Antonelli
    Aluisio, Sandra
    LANGUAGE RESOURCES AND EVALUATION, 2022, 56 (03) : 1043 - 1055
  • [2] TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
    Edresson Casanova
    Arnaldo Candido Junior
    Christopher Shulby
    Frederico Santos de Oliveira
    João Paulo Teixeira
    Moacir Antonelli Ponti
    Sandra Aluísio
    Language Resources and Evaluation, 2022, 56 : 1043 - 1055
  • [3] The C-ORAL-BRASIL I: Reference Corpus for Spoken Brazilian Portuguese
    Raso, Tommaso
    Mello, Heliana
    Mittmann, Maryuale M.
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 106 - 113
  • [4] brWaC: A WaCky Corpus for Brazilian Portuguese
    Boos, Rodrigo
    Prestes, Kassius
    Villavicencio, Aline
    Padro, Muntsa
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 201 - 206
  • [5] Temporal Tagging of Noisy Clinical Texts in Brazilian Portuguese
    de Azevedo, Rafael Faria
    Santos Rodrigues, Joao Pedro
    da Silva Reis, Mayara Regina
    Cabral Moro, Claudia Maria
    Paraiso, Emerson Cabrera
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 231 - 241
  • [6] A Corpus of Neutral Voice Speech in Brazilian Portuguese
    Leite, Pedro H. L.
    Hoyle, Edmundo
    Antelo, Alvaro
    Kruszielski, Luiz F.
    Biscainho, Luiz W. P.
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 344 - 352
  • [7] Building a Sentiment Corpus of Tweets in Brazilian Portuguese
    Brum, Henrico Bertini
    Volpe Nunes, Maria das Gracas
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4167 - 4172
  • [8] Introducing the Reference Corpus of Contemporary Portuguese Online
    Genereux, Michel
    Hendrickx, Iris
    Mendes, Amalia
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2237 - 2244
  • [9] Development of a Brazilian Portuguese Hotel's Reviews Corpus
    Ribeiro de Souza, Joana Gabriela
    Oliveira, Alcione de Paiva
    Moreira, Alexandra
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 353 - 361
  • [10] A Corpus Study of Verbal Multiword Expressions in Brazilian Portuguese
    Ramisch, Carlos
    Ramisch, Renata
    Zilio, Leonardo
    Villavicencio, Aline
    Cordeiro, Silvio
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 24 - 34