Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings

被引:0
|
作者
Sasaki, Shota [1 ,2 ]
Suzuki, Jun [1 ,2 ]
Inui, Kentaro [1 ,2 ]
机构
[1] RIKEN, Ctr Adv Intelligence Project, Sendai, Miyagi 9808579, Japan
[2] Tohoku Univ, RIKEN, Sendai, Miyagi 9808579, Japan
关键词
Task analysis; Memory management; Semantics; Indexes; Vocabulary; Syntactics; Speech processing; Neural word embeddings; open vocabulary; subwords;
D O I
10.1109/TASLP.2021.3125133
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The methodology of neural word embeddings has become an important fundamental resource for tackling many applications in the artificial intelligence (AI) research field. They have successfully been proven to capture high-quality syntactic and semantic relationships in a vector space. Despite their significant impact, neural word embeddings have several disadvantages. In this paper, we focus on two issues regarding well-trained word embeddings: (i) the massive memory requirement and (ii) the inapplicability of out-of-vocabulary (OOV) words. To overcome these two issues, we propose a method of reconstructing pre-trained word embeddings by using subword information that effectively represents a large number of subword embeddings in a considerably small fixed space while preventing quality degradation from the original word embeddings. The key techniques of our method are twofold: memory-shared embeddings and a variant of the key-value-query self-attention mechanism. Our experiments show that our reconstructed subword-based word embeddings can successfully imitate well-trained word embeddings in a small fixed space while preventing quality degradation across several linguistic benchmark datasets and can simultaneously predict effective embeddings of OOV words. We also demonstrate the effectiveness of our reconstruction method when it is applied to downstream tasks, such as named entity recognition and natural language inference tasks.
引用
收藏
页码:3551 / 3564
页数:14
相关论文
共 50 条
  • [1] Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings
    Sasaki, Shota
    Suzuki, Jun
    Inui, Kentaro
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2021, 29 : 3551 - 3564
  • [2] Subword-based Compact Reconstruction of Word Embeddings
    Sasaki, Shota
    Suzuki, Jun
    Inui, Kentaro
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3498 - 3508
  • [3] Open-Vocabulary Retrieval of Spoken Content with Shorter/Longer Queries Considering Word/Subword-based Acoustic Feature Similarity
    Lee, Hung-yi
    Chou, Po-wei
    Lee, Lin-shan
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2075 - 2078
  • [4] Open-Vocabulary Spoken Document Retrieval based on new subword models and subword phonetic similarity
    Iwata, Kohei
    Itoh, Yoshiaki
    Kojima, Kazunori
    Ishigame, Masaaki
    Tanaka, Kazuyo
    Lee, Shi-wook
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 325 - +
  • [5] SUBWORD-BASED LARGE-VOCABULARY SPEECH RECOGNITION
    LEE, CH
    GAUVAIN, JL
    PIERACCINI, R
    RABINER, LR
    [J]. AT&T TECHNICAL JOURNAL, 1993, 72 (05): : 25 - 36
  • [6] Improved open-vocabulary spoken content retrieval with word and subword lattices using acoustic feature similarity
    Lee, Hung-yi
    Chou, Po-wei
    Lee, Lin-shan
    [J]. COMPUTER SPEECH AND LANGUAGE, 2014, 28 (05): : 1045 - 1065
  • [7] Combining multiple subword representations for open-vocabulary spoken document retrieval
    Lee, SW
    Tanaka, K
    Itoh, Y
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 505 - 508
  • [8] Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
    Sacchi, Niccolo
    Nanchen, Alexandre
    Jaggi, Martin
    Cernak, Milos
    [J]. INTERSPEECH 2019, 2019, : 3362 - 3366
  • [9] Neural keyword confidence estimation for open-vocabulary keyword spotting
    Liu, Zuozhen
    Li, Ta
    Zhang, Pengyuan
    [J]. ELECTRONICS LETTERS, 2022, 58 (03) : 133 - 135
  • [10] An Open Vocabulary OCR System with Hybrid Word-Subword Language Models
    Cai, Meng
    Hu, Wenping
    Chen, Kai
    Sun, Lei
    Liang, Sen
    Mo, Xiongjian
    Huo, Qiang
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 519 - 524