The COPLE2 Corpus: a Learner Corpus for Portuguese

被引:0
|
作者
Mendes, Amalia [1 ]
Antunes, Sandra [1 ]
Janssen, Maarten [1 ]
Goncalves, Anabela [1 ]
机构
[1] Univ Lisbon, FLUL, CLUL, Alameda Univ, Lisbon, Portugal
关键词
learner corpus; corpus compilation; language learning/teaching;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.
引用
收藏
页码:3207 / 3214
页数:8
相关论文
共 50 条
  • [1] Error annotation in a Learner Corpus of Portuguese
    del Rio, Iria
    Mendes, Amalia
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4116 - 4119
  • [2] University of Macau Portuguese learner corpus and teaching of Portuguese L2
    Zhang, Jing
    You, Mu
    [J]. TEXTO LIVRE-LINGUAGEM E TECNOLOGIA, 2024, 17
  • [3] The IFCASL corpus as a phonetic learner corpus
    Trouvain, Jurgen
    [J]. ZEITSCHRIFT FUR GERMANISTISCHE LINGUISTIK, 2022, 50 (01): : 82 - 103
  • [4] Building a learner corpus
    Hana, Jirka
    Rosen, Alexandr
    Stindlova, Barbora
    Jaeger, Petr
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3228 - 3232
  • [5] Building a learner corpus
    Hana, Jirka
    Rosen, Alexandr
    Stindlova, Barbora
    Stepanek, Jan
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2014, 48 (04) : 741 - 752
  • [6] Building a learner corpus
    Jirka Hana
    Alexandr Rosen
    Barbora Štindlová
    Jan Štěpánek
    [J]. Language Resources and Evaluation, 2014, 48 : 741 - 752
  • [7] The learner as corpus designer
    Aston, G
    [J]. TEACHING AND LEARNING BY DOING CORPUS ANALYSIS, 2002, (42): : 9 - 25
  • [8] Collocations in a learner corpus
    Durrant, Philip
    [J]. FUNCTIONS OF LANGUAGE, 2007, 14 (02) : 251 - 261
  • [9] Collocations in a Learner Corpus
    Wagner, Joachim
    [J]. MACHINE TRANSLATION, 2006, 20 (04) : 301 - 303
  • [10] TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
    Casanova, Edresson
    Junior, Arnaldo Candido
    Shulby, Christopher
    de Oliveira, Frederico Santos
    Teixeira, Joao Paulo
    Ponti, Moacir Antonelli
    Aluisio, Sandra
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2022, 56 (03) : 1043 - 1055