LINGUISTICALLY ANNOTATED SPOKEN NGANASAN CORPUS

被引:0
|
作者
Beata, Wagner-Nagy [1 ]
Sandor, Szeverenyi [1 ]
机构
[1] Inst Finnougrist Uralist, Von Melle Pk 6, D-20146 Hamburg, Germany
关键词
Nganasan; annotation; corpus; endangered language; language documentation;
D O I
暂无
中图分类号
Q98 [人类学];
学科分类号
030303 ;
摘要
The paper discusses the key issues of the annotation method employed in the project "Lingustically annotated spoken Nganasan corpus". The data are processed and stored in the EXMARaLDA format. The annotation of the database involves grammatical and part-of-speech tagging (made in Toolbox or Flex), translation into Russian and English. However, the present paper addresses the questions of syntactic roles, and information structure. For this purpose we use the format designed by other researchers and adapted by us to the Nganasan language. In the paper we describe the system of annotation (tags, terms and their clarification) illustrated by a large amount of Nganasan examples.
引用
收藏
页码:25 / 34
页数:10
相关论文
共 50 条
  • [1] Corpus Linguistics and Linguistically Annotated Corpora
    Rodriguez-Fuentes, Rodrigo A.
    [J]. LANGUAGE LEARNING & TECHNOLOGY, 2015, 19 (03): : 56 - 60
  • [2] Corpus Linguistics and Linguistically Annotated Corpora
    Xiao-Desai, Yang
    Kuebler, Sandra
    [J]. MODERN LANGUAGE JOURNAL, 2015, 99 (04): : 801 - 802
  • [3] Corpus Linguistics and Linguistically Annotated Corpora
    McCallum, Lee
    [J]. ARAB WORLD ENGLISH JOURNAL, 2016, 7 (01) : 521 - 524
  • [4] Corpus Linguistics and Linguistically Annotated Corpora
    Yildiz, Yasemin
    [J]. SYSTEM, 2017, 70 : 134 - 136
  • [5] Annotated Corpus of Polish Spoken Dialogues
    Mykowiecka, Agnieszka
    Marasek, Krzysztof
    Marciniak, Malgorzata
    Rabiega-Wisniewska, Joanna
    Gubrynowicz, Ryszard
    [J]. HUMAN LANGUAGE TECHNOLOGY: CHALLENGES OF THE INFORMATION SOCIETY, 2009, 5603 : 50 - +
  • [6] The SETIMES.HR Linguistically Annotated Corpus of Croatian
    Agic, Zeljko
    Ljubesic, Nikola
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1724 - 1727
  • [7] TaLAPi - A Thai Linguistically Annotated Corpus for Language Processing
    Aw, AiTi
    Aljunied, Sharifah Mahani
    Lertcheva, Nattadaporn
    Kalunsima, Sasiwimon
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [8] Developing a Linguistically Annotated Corpus of Chinese Electronic Medical Record
    Jiang, Zhipeng
    Zhao, Fangfang
    Guan, Yi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [9] ProPOSEC: A Prosody and PoS Annotated Spoken English Corpus
    Brierley, Claire
    Atwell, Eric
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1266 - 1270
  • [10] Spiral construction of syntactically annotated spoken language corpus
    Ohno, T
    Matsubara, S
    Kawaguchi, N
    Inagaki, Y
    [J]. 2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 477 - 483