Combining Subword Representations into Word-level Representations in the Transformer Architecture

被引:0
|
作者
Casas, Noe [1 ]
Costa-jussa, Marta R. [1 ]
Fonollosa, Jose A. R. [1 ]
机构
[1] Univ Politecn Cataluna, Barcelona, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Neural Machine Translation, using word-level tokens leads to degradation in translation quality. The dominant approaches use subword-level tokens, but this increases the length of the sequences and makes it difficult to profit from word-level information such as POS tags or semantic dependencies. We propose a modification to the Transformer model to combine subword-level representations into word-level ones in the first layers of the encoder, reducing the effective length of the sequences in the following layers and providing a natural point to incorporate extra word-level information. Our experiments show that this approach maintains the translation quality with respect to the normal Transformer model when no extra word-level information is injected and that it is superior to the currently dominant method for incorporating word-level source language information to models based on subword-level vocabularies.
引用
收藏
页码:66 / 71
页数:6
相关论文
共 50 条
  • [1] Gating Mechanisms for Combining Character and Word-level Word Representations: An Empirical Study
    Balazs, Jorge A.
    Matsuo, Yutaka
    [J]. NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2019, : 110 - 124
  • [2] Subword-level Word Vector Representations for Korean
    Park, Sungjoon
    Byun, Jeongmin
    Baek, Sion
    Cho, Yongseok
    Oh, Alice
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2429 - 2438
  • [3] Abstract recommendation system: beyond word-level representations
    Korolev, Vadim
    Mitrofanov, Artem
    Sattarov, Boris
    Tkachenko, Valery
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 258
  • [4] Linearity of word-level representations of multiple-valued networks
    Yanushkevich, SN
    Shmerko, VP
    Malyugin, VD
    Dziurzanski, P
    Tomaszewska, AM
    [J]. JOURNAL OF MULTIPLE-VALUED LOGIC AND SOFT COMPUTING, 2004, 10 (02) : 129 - 158
  • [5] LEARNING WORD-LEVEL CONFIDENCE FOR SUBWORD END-TO-END ASR
    Qiu, David
    Li, Qiujia
    He, Yanzhang
    Zhang, Yu
    Li, Bo
    Cao, Liangliang
    Prabhavalkar, Rohit
    Bhatia, Deepti
    Li, Wei
    Hu, Ke
    Sainath, Tara N.
    McGraw, Ian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6393 - 6397
  • [6] Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations
    Chaudhary, Aditi
    Zhou, Chunting
    Levin, Lori
    Neubig, Graham
    Mortensen, David R.
    Carbonell, Jaime G.
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3285 - 3295
  • [7] HDP-CNN: Highway deep pyramid convolution neural network combining word-level and character-level representations for phishing website detection
    Zheng, Faan
    Yan, Qiao
    Leung, Victor C. M.
    Yu, F. Richard
    Ming, Zhong
    [J]. COMPUTERS & SECURITY, 2022, 114
  • [8] Analysis of Subword based Word Representations Case Study: Fasttext Malayalam
    Vivek, M. R.
    Chandran, Priya
    [J]. 2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [9] Collaborative Viseme Subword and End-to-End Modeling for Word-Level Lip Reading
    Chen, Hang
    Wang, Qing
    Du, Jun
    Wan, Gen-Shun
    Xiong, Shi-Fu
    Yin, Bao-Ci
    Pan, Jia
    Lee, Chin-Hui
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9358 - 9371
  • [10] A WORD-LEVEL TOKEN-PASSING DECODER FOR SUBWORD N-GRAM LVCSR
    Varjokallio, Matti
    Kurimo, Mikko
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 495 - 500