Building a Case-based Semantic English-Chinese Parallel Treebank

被引:0
|
作者
Shi, Huaxing [1 ]
Zhao, Tiejun [1 ]
Su, Keh-Yih [2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[2] Acad Sinica, 128 Acad Rd,Sect 2, Taipei 11529, Taiwan
关键词
English-Chinese semantic constituent parallel Treebank; case tree annotation; semantic machine translation corpus;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We construct a case-based English-to-Chinese semantic constituent parallel Treebank for a Statistical Machine Translation (SMT) task by labelling each node of the Deep Syntactic Tree (DST) with our refined semantic cases. Since subtree span-crossing is harmful in tree-based SMT, DST is adopted to alleviate this problem. At the same time, we tailor an existing case set to represent bilingual shallow semantic relations more precisely. This Treebank is a part of a semantic corpus building project, which aims to build a semantic bilingual corpus annotated with syntactic, semantic cases and word senses. Data in our Treebank is from the news domain of Datum corpus. 4,000 sentence pairs are selected to cover various lexicons and part-of-speech (POS) n-gram patterns as much as possible. This paper presents the construction of this case Treebank. Also, we have tested the effect of adopting DST structure in alleviating subtree span-crossing. Our preliminary analysis shows that the compatibility between Chinese and English trees can be significantly increased by transforming the parse-tree into the DST. Furthermore, the human agreement rate in annotation is found to be acceptable (90% for English nodes, 75% for Chinese nodes).
引用
收藏
页码:2918 / 2924
页数:7
相关论文
共 50 条
  • [1] Issues in building English-Chinese parallel corpora with WordNets
    Bond, Francis
    Wang, Shan
    [J]. PROCEEDINGS OF THE SEVENTH GLOBAL WORDNET CONFERENCE, GWC 2014, 2014, : 391 - 399
  • [2] The effects of sentence length on dependency distance, dependency direction and the implications-Based on a parallel English-Chinese dependency treebank
    Jiang, Jingyang
    Liu, Haitao
    [J]. LANGUAGE SCIENCES, 2015, 50 : 93 - 104
  • [3] On Semantic Equivalence in English-Chinese Translation
    Fu, Linyan
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON SOCIETY SCIENCE (ICOSS 2017), 2017, 117 : 164 - 167
  • [4] English Characteristic Semantic Block Processing Based on English-Chinese Machine Translation
    Yu, Yuxiu
    [J]. ADVANCES IN MULTIMEDIA, 2022, 2022
  • [5] Teaching Design for Translation Based on English-Chinese Parallel Corpus
    Sun, Lihua
    Li, Zhiyuan
    [J]. 2017 2ND EBMEI INTERNATIONAL CONFERENCE ON EDUCATION, INFORMATION AND MANAGEMENT (EBMEI-EIM 2017, 2017, 85 : 57 - 60
  • [6] Building an English-Chinese Parallel Corpus Annotated with Sub-sentential Translation Techniques
    Zhai, Yuming
    Liu, Lufei
    Zhong, Xinyi
    Illouz, Gabriel
    Vilnat, Anne
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4024 - 4033
  • [7] Study on an impersonal evaluation system for English-Chinese translation based on semantic understanding
    Ke, Xiaohua
    Ma, Qinghua
    [J]. PERSPECTIVES-STUDIES IN TRANSLATOLOGY, 2014, 22 (02): : 242 - 254
  • [8] English-Chinese Translation Based on Chinese Grammar
    Guo, Qing
    [J]. 2015 4TH INTERNATIONAL CONFERENCE ON PHYSICAL EDUCATION AND SOCIETY MANAGEMENT (ICPESM 2015), PT 1, 2015, 47 : 243 - 248
  • [9] Mining an English-Chinese parallel Dataset of Financial News
    Turenne, Nicolas
    Chen, Ziwei
    Fan, Guitao
    Li, Jianlong
    Li, Yiwen
    Wang, Siyuan
    Zhou, Jiaqi
    [J]. JOURNAL OF OPEN HUMANITIES DATA, 2022, 8
  • [10] ParaMed: a parallel corpus for English-Chinese translation in the biomedical domain
    Liu, Boxiang
    Huang, Liang
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)