Compressing Word Embeddings

被引：8

作者：

Andrews, Martin ^{[1
]}

机构：

[1] Red Cat Labs, Singapore, Singapore

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV | 2016年 / 9950卷

关键词：

D O I：

10.1007/978-3-319-46681-1_50

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using large-scale unlabelled text analysis. However, these representations typically consist of dense vectors that require a great deal of storage and cause the internal structure of the vector space to be opaque. A more 'idealized' representation of a vocabulary would be both compact and readily interpretable. With this goal, this paper first shows that Lloyd's algorithm can compress the standard dense vector representation by a factor of 10 without much loss in performance. Then, using that compressed size as a 'storage budget', we describe a new GPU-friendly factorization procedure to obtain a representation which gains interpretability as a side-effect of being sparse and non-negative in each encoding dimension. Word similarity and word-analogy tests are used to demonstrate the effectiveness of the compressed representations obtained.

引用

页码：413 / 422

页数：10

共 50 条

[1] Compressing and interpreting word embeddings with latent space regularization and interactive semantics probing
Li, Haoyu
Wang, Junpeng
Zheng, Yan
Wang, Liang
Zhang, Wei
Shen, Han-Wei
[J]. INFORMATION VISUALIZATION, 2023, 22 (01) : 52 - 68
[2] Socialized Word Embeddings
Zeng, Ziqian
Yin, Yichun
Song, Yangqiu
Zhang, Ming
[J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3915 - 3921
[3] Dynamic Word Embeddings
Bamler, Robert
Mandt, Stephan
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[4] Urdu Word Embeddings
Haider, Samar
[J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 964 - 968
[5] isiZulu Word Embeddings
Dlamini, Sibonelo
Jembere, Edgar
Pillay, Anban
van Niekerk, Brett
[J]. 2021 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY (ICTAS), 2021, : 121 - 126
[6] Topical Word Embeddings
Liu, Yang
Liu, Zhiyuan
Chua, Tat-Seng
Sun, Maosong
[J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2418 - 2424
[7] Bias in Word Embeddings
Papakyriakopoulos, Orestis
Hegelich, Simon
Serrano, Juan Carlos Medina
Marco, Fabienne
[J]. FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, : 446 - 457
[8] Relational Word Embeddings
Camacho-Collados, Jose
Espinosa-Anke, Luis
Schockaert, Steven
[J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3286 - 3296
[9] Biomedical Word Sense Disambiguation with Word Embeddings
Antunes, Rui
Matos, Sergio
[J]. 11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2017, 616 : 273 - 279
[10] Overcoming Poor Word Embeddings with Word Definitions
Malon, Christopher
[J]. 10TH CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS (SEM 2021), 2021, : 288 - 293

← 1 2 3 4 5 →