Improving Chinese to English SMT with multiple CWS results

被引:0
|
作者
Ma, Yongliang [1 ]
Zhao, Tiejun [1 ]
机构
[1] Harbin Inst Technol, MOE Microsoft Key Lab Nat Language Proc & Speech, Harbin 150006, Peoples R China
关键词
Chinese word segmentation; SMT; feature blending; feature interpolation;
D O I
10.1109/IALP.2009.36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Chinese to English statistical machine translation (SMT), Chinese texts always need a pre-processing which segments sentences into words and this standard approach is Chinese word segmentation (CWS). However, CWS is not developed for SMT, its results are not necessarily optimal for SMT. In recent years, many investigations have been performed concerning making CWS suitable for SMT, but we explore it from another direction. In this paper, our basic idea is to use multiple CWS results as additional language knowledge sources and we present a simple and effective approach to use multiple CWS results for SMT. We also give experiment results over range of strategy settings, and obtain substantial improvements in performance for translation from Chinese to English. The best result shows we gain 1.89 BLEU percentage points over a state of the art HPBT baseline system without using multiple CWS results.
引用
收藏
页码:135 / 140
页数:6
相关论文
共 50 条
  • [41] The Chinese English dictionary: An online resource for Chinese English lexicography
    Qin, Melissa Xiaohui
    Gao, Jingyang
    WORLD ENGLISHES, 2020, 39 (01) : 154 - 170
  • [42] Chinese-English Bilinguals Reading English Hear Chinese
    Wu, Yan Jing
    Thierry, Guillaume
    JOURNAL OF NEUROSCIENCE, 2010, 30 (22): : 7646 - 7651
  • [43] 翻译中的“Chinese English”和“English Chinese”
    应国丽
    温州大学学报(自然科学版), 2000, (02) : 54 - 58
  • [44] Idioms in state-of-the-art Croatian-English and English-Croatian SMT systems
    Manojlovic, Maja
    Dajak, Luka
    Bakaric, Marija Brkic
    2017 40TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2017, : 1546 - 1550
  • [45] Improving memory latency aware fetch policies for SMT processors
    Cazorla, FJ
    Fernandez, E
    Ramírez, A
    Valero, M
    HIGH PERFORMANCE COMPUTING, 2003, 2858 : 70 - 85
  • [46] Improving SMT by Using Parallel Data of a Closely Related Language
    Galuscakova, Petra
    Bojar, Ondrej
    HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 58 - 65
  • [47] Improving two level thread schedule policy for SMT architecture
    Department of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
    不详
    Xibei Gongye Daxue Xuebao, 2007, 3 (433-437):
  • [48] A Multiple Teaching Model for Chinese Culture Integration in English Translation Courses with Multiple Data Chain Networks
    Wang X.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [49] Improving an Industrial Test Generation Tool Using SMT Solver
    Ren, Hao
    Bhatt, Devesh
    Hvozdovic, Jan
    NASA FORMAL METHODS, NFM 2016, 2016, 9690 : 100 - 106
  • [50] A Bayesian Model Averaging Method for Improving SMT Phrase Table
    Duan, Nan
    FIFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2012): ALGORITHMS, PATTERN RECOGNITION AND BASIC TECHNOLOGIES, 2013, 8784