Subword-based Compact Reconstruction of Word Embeddings

被引:0
|
作者
Sasaki, Shota [1 ,2 ]
Suzuki, Jun [1 ,2 ]
Inui, Kentaro [1 ,2 ]
机构
[1] RIKEN AIP, Tokyo, Japan
[2] Tohoku Univ, Sendai, Miyagi, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The idea of subword-based word embeddings has been proposed in the literature, mainly for solving the out-of-vocabulary (OOV) word problem observed in standard word-based word embeddings. In this paper, we propose a method of reconstructing pre-trained word embeddings using subword information that can effectively represent a large number of subword embeddings in a considerably small fixed space. The key techniques of our method are twofold: memory-shared embeddings and a variant of the key-value-query self-attention mechanism. Our experiments show that our reconstructed subword-based embeddings can successfully imitate well-trained word embeddings in a small fixed space while preventing quality degradation across several linguistic benchmark datasets, and can simultaneously predict effective embeddings of OOV words. We also demonstrate the effectiveness of our reconstruction method when we apply them to downstream tasks(1).
引用
收藏
页码:3498 / 3508
页数:11
相关论文
共 50 条
  • [1] Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings
    Sasaki, Shota
    Suzuki, Jun
    Inui, Kentaro
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2021, 29 : 3551 - 3564
  • [2] Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings
    Sasaki, Shota
    Suzuki, Jun
    Inui, Kentaro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3551 - 3564
  • [3] On the Impact of the Length of Subword Vectors on Word Embeddings
    Cai, Xiangrui
    Luo, Yonghong
    Zhang, Ying
    Yuan, Xiaojie
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 495 - 499
  • [4] Subword-based approaches for spoken document retrieval
    Ng, K
    Zue, VW
    [J]. SPEECH COMMUNICATION, 2000, 32 (03) : 157 - 186
  • [5] Subword-based Semantic Retrieval of Clinical and Bibliographic Documents
    Daumke, P.
    Schulz, S.
    Mueller, M. L.
    Dzeyk, W.
    Prinzen, L.
    Pacheco, E. J.
    Cancian, P. Secco
    Nohama, P.
    Marko, K.
    [J]. METHODS OF INFORMATION IN MEDICINE, 2010, 49 (02) : 141 - 147
  • [6] Discrimination power weighted subword-based speaker verification
    Chan, SM
    Siu, MH
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 45 - 48
  • [7] SUBWORD-BASED LARGE-VOCABULARY SPEECH RECOGNITION
    LEE, CH
    GAUVAIN, JL
    PIERACCINI, R
    RABINER, LR
    [J]. AT&T TECHNICAL JOURNAL, 1993, 72 (05): : 25 - 36
  • [8] Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates
    Patel, Raj
    Domeniconi, Carlotta
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [9] SUBWORD-BASED SPOKEN TERM DETECTION IN AUDIO COURSE LECTURES
    Rose, Richard
    Norouzian, Atta
    Reddy, Aarthi
    Coy, Andre
    Gupta, Vishwa
    Karafiat, Martin
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5282 - 5285
  • [10] Improving the Usage of Subword-Based Units for Turkish Speech Recognition
    Cetinkaya, Gozde
    Arisoy, Ebru
    Saraclar, Murat
    [J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,