Effect of Linguistic Information in Neural Machine Translation

被引:0
|
作者
Nakamura, Naomichi [1 ]
Isahara, Hitoshi [2 ]
机构
[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
[2] Toyohashi Univ Technol, Informat & Media Ctr, Toyohashi, Aichi, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks(DNNs) outperform previous works in many fields such as in natural language processing. Neural Machine Translation(NMT) also outperforms Statistical Machine Translation(SMT) which has complex features and rules. However, NMT requires a large corpus and a long calculation time. In order to suppress calculation cost, recent researches replaced low frequency words with symbols. However, the symbols make sentences ambiguous and deteriorates translation accuracy. To solve this problem, sub-word units such as Byte Pair Encoding(BPE) and Wordpiece Model(WPM) creating vocabularies in a prespecified vocabulary size has been proposed. Nevertheless, these tokenize methods break words and treat them as symbols. Words as symbols are compatible with neural networks and NMT performance has increased. This result shows that linguistic correctness is not necessarily important in NMT. If that is the case, we wonder to what extent linguistic correctness contributes to NMT accuracy. In this research, we experiment to incorporate linguistic information into sub-word units. Experimentally, we demonstrate that morpheme as linguistic information is a helpful factor for sub-word units.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] LINGUISTIC MATERIALS FOR THE MACHINE TRANSLATION SYSTEMS
    Vicic, Jernej
    [J]. ANNALES-ANALI ZA ISTRSKE IN MEDITERANSKE STUDIJE-SERIES HISTORIA ET SOCIOLOGIA, 2016, 26 (04): : 751 - 766
  • [22] Neural Machine Translation
    Birch, Alexandra
    [J]. NATURAL LANGUAGE ENGINEERING, 2021, 27 (03) : 377 - 378
  • [23] Neural Machine Translation
    Jooste, Wandri
    Haque, Rejwanul
    Way, Andy
    [J]. MACHINE TRANSLATION, 2021, 35 (02) : 289 - 299
  • [24] Neural Machine Translation Advised by Statistical Machine Translation
    Wang, Xing
    Lu, Zhengdong
    Tu, Zhaopeng
    Li, Hang
    Xiong, Deyi
    Zhang, Min
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3330 - 3336
  • [25] Neural Machine Translation as a Novel Approach to Machine Translation
    Benkova, Lucia
    Benko, Lubomir
    [J]. DIVAI 2020: 13TH INTERNATIONAL SCIENTIFIC CONFERENCE ON DISTANCE LEARNING IN APPLIED INFORMATICS, 2020, : 499 - 508
  • [26] Neural Name Translation Improves Neural Machine Translation
    Li, Xiaoqing
    Yan, Jinghui
    Zhang, Jiajun
    Zong, Chengqing
    [J]. MACHINE TRANSLATION, CWMT 2018, 2019, 954 : 93 - 100
  • [27] Cheat Codes to Quantify Missing Source Information in Neural Machine Translation
    Pal, Proyag
    Heafield, Kenneth
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2472 - 2477
  • [28] Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation
    Shao, Chenze
    Feng, Yang
    Zhang, Jinchao
    Meng, Fandong
    Chen, Xilin
    Zhou, Jie
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3013 - 3024
  • [29] Semantic and syntactic information for neural machine translation Injecting Features to the Transformer
    Armengol-Estape, Jordi
    Costa-jussa, Marta R.
    [J]. MACHINE TRANSLATION, 2021, 35 (01) : 3 - 17
  • [30] Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation
    Xu, Yangyifan
    Liu, Yijin
    Meng, Fandong
    Zhang, Jiajun
    Xu, Jinan
    Zhou, Jie
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 511 - 516