An account of the challenge of tagging a reference corpus for Brazilian Portuguese

被引:0
|
作者
Aluísio, S
Pelizzoni, J
Marchi, AR
de Oliveira, L
Manenti, R
Marquiafável, V
机构
[1] Univ Sao Paulo, ICMC, DCCE, BR-13560970 Sao Carlos, SP, Brazil
[2] USP, ICMC, NILC, BR-13560970 Sao Carlos, SP, Brazil
来源
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS | 2003年 / 2721卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article identifies and addresses the major linguistic/conceptual, as opposed to logistic, issues faced in the morphosyntactic tagging of MAC-Morpho, a 1.1 million word Brazilian Portuguese corpus of newspaper articles that has been developed in the Lacio-Web Project. Rather than simply presenting the annotated corpus and describing its tagset, we elaborate on the criteria for establishing the tagset and analyze some interesting cases amongst the linguistic problems we faced in this work.
引用
收藏
页码:110 / 117
页数:8
相关论文
共 50 条