Research and Implementation on Machine Translation System with Online Corpora Extraction Technology

被引:1
|
作者
Lin Chirong [1 ]
机构
[1] Changsha Aeronaut Vocat & Tech Coll, Changsha 410014, Hunan, Peoples R China
关键词
corpora; extraction; bilingual parallel; MTS; webpages;
D O I
10.1109/ISDEA.2014.172
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bilingual parallel sentence pairs are important resources of machine translation. Due to the limitation of obtaining ways, sentence leveled parallel corpora are not only limited in quantity, but they also concentrate in specific field. So they are difficult to be adapted to genuine application requirements. This paper introduces a Web-based automatic acquisition system of bilingual parallel sentence pairs. The system integrates the advantages of current system and improves its key technologies. We proposes a URL naming method in automatic discovery bilingual network and improves the extraction technology of bilingual parallel sentence pairs. Experimental results show that the methods in this paper greatly improves recalling rate of candidate bilingual network discovery. Its recall rate of obtaining bilingual parallel sentence pairs is 93% as well as accuracy rate is 96%, which proves its effectiveness. In addition, this paper also studies bilingual parallel sentence pairs inside bilingual network and obtains some primary result. Multi-group experiments of statistical machine translation prove that our method can improve the performance of machine translation system so that it can play a part in practical application of online corpora.
引用
收藏
页码:759 / 763
页数:5
相关论文
共 50 条
  • [1] Machine translation and human translation Using machine translation engines and corpora for teaching and research
    Maia, Belinda
    CURRENT TRENDS IN CONTRASTIVE LINGUISTICS: FUNCTIONAL AND COGNITIVE PERSPECTIVES, 2008, 60 : 123 - 145
  • [2] The Application of Online Machine Translation System in Translation Teaching
    Guan, Xiaowei
    EDUCATION MANAGEMENT, EDUCATION THEORY AND EDUCATION APPLICATION, 2011, 109 : 87 - 91
  • [3] Machine Learning in Translation Corpora Processing
    Du, Xiangtao
    Liu, Kanglong
    INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2021, 26 (02) : 298 - 303
  • [4] Multimodal Comparable Corpora for Machine Translation
    Afli, Haithem
    Barrault, Loic
    Schwenk, Holger
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [5] Comparabilty of Corpora in Human and Machine Translation
    Lapshinova-Koltunski, Ekaterina
    Pal, Santanu
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [6] Improved machine translation performance via parallel sentence extraction from comparable corpora
    Munteanu, DS
    Fraser, A
    Marcu, D
    HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 265 - 272
  • [7] Parallel Corpora based Translation Resources Extraction
    Simoes, Alberto
    Almeida, Jose Joao
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2007, (39): : 265 - 272
  • [8] Impact of Corpora Quality on Neural Machine Translation
    Rikters, Matiss
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 126 - 133
  • [9] VANT : A Visual Analytics System for Refining Parallel Corpora in Neural Machine Translation
    Park, Sebeom
    Lee, Soohyun
    Kim, Youngtaek
    Jeon, Hyeon
    Jung, Seokweon
    Bok, Jinwook
    Seo, Jinwook
    2022 IEEE 15TH PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS 2022), 2022, : 181 - 185
  • [10] Comparative Analysis of Online Translators in the Machine Translation System
    Matviienko, Lesia
    Khomenko, Liubov
    Denysovets, Iryna
    Horodenska, Kateryna
    Nikolashyna, Tetyana
    Pavlova, Iryna
    REVISTA ROMANEASCA PENTRU EDUCATIE MULTIDIMENSIONALA, 2024, 16 (03): : 101 - 118