A Corpus of Native, Non-native and Translated Texts

被引:0
|
作者
Nisioi, Sergiu [1 ]
Rabinovich, Ella [2 ]
Dinu, Liviu P. [1 ]
Wintner, Shuly [2 ]
机构
[1] Univ Bucharest, Ctr Computat Linguist, Bucharest, Romania
[2] Univ Haifa, Dept Comp Sci, Haifa, Israel
关键词
Corpus linguistics; Translation; Bilingualism; Second language acquisition;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We describe a monolingual English corpus of original and (human) translated texts, with an accurate annotation of speaker properties, including the original language of the utterances and the speaker's country of origin. We thus obtain three sub-corpora of texts reflecting native English, non-native English, and English translated from a variety of European languages. This dataset will facilitate the investigation of similarities and differences between these kinds of sub-languages. Moreover, it will facilitate a unified comparative study of translations and language produced by (highly fluent) non-native speakers, two closely-related phenomena that have only been studied in isolation so far.
引用
收藏
页码:4197 / 4201
页数:5
相关论文
共 50 条