Compressing Word Embeddings

被引:8
|
作者
Andrews, Martin [1 ]
机构
[1] Red Cat Labs, Singapore, Singapore
关键词
D O I
10.1007/978-3-319-46681-1_50
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using large-scale unlabelled text analysis. However, these representations typically consist of dense vectors that require a great deal of storage and cause the internal structure of the vector space to be opaque. A more 'idealized' representation of a vocabulary would be both compact and readily interpretable. With this goal, this paper first shows that Lloyd's algorithm can compress the standard dense vector representation by a factor of 10 without much loss in performance. Then, using that compressed size as a 'storage budget', we describe a new GPU-friendly factorization procedure to obtain a representation which gains interpretability as a side-effect of being sparse and non-negative in each encoding dimension. Word similarity and word-analogy tests are used to demonstrate the effectiveness of the compressed representations obtained.
引用
收藏
页码:413 / 422
页数:10
相关论文
共 50 条
  • [1] Compressing and interpreting word embeddings with latent space regularization and interactive semantics probing
    Li, Haoyu
    Wang, Junpeng
    Zheng, Yan
    Wang, Liang
    Zhang, Wei
    Shen, Han-Wei
    [J]. INFORMATION VISUALIZATION, 2023, 22 (01) : 52 - 68
  • [2] Socialized Word Embeddings
    Zeng, Ziqian
    Yin, Yichun
    Song, Yangqiu
    Zhang, Ming
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3915 - 3921
  • [3] Dynamic Word Embeddings
    Bamler, Robert
    Mandt, Stephan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [4] Urdu Word Embeddings
    Haider, Samar
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 964 - 968
  • [5] isiZulu Word Embeddings
    Dlamini, Sibonelo
    Jembere, Edgar
    Pillay, Anban
    van Niekerk, Brett
    [J]. 2021 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY (ICTAS), 2021, : 121 - 126
  • [6] Topical Word Embeddings
    Liu, Yang
    Liu, Zhiyuan
    Chua, Tat-Seng
    Sun, Maosong
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2418 - 2424
  • [7] Bias in Word Embeddings
    Papakyriakopoulos, Orestis
    Hegelich, Simon
    Serrano, Juan Carlos Medina
    Marco, Fabienne
    [J]. FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, : 446 - 457
  • [8] Relational Word Embeddings
    Camacho-Collados, Jose
    Espinosa-Anke, Luis
    Schockaert, Steven
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3286 - 3296
  • [9] Biomedical Word Sense Disambiguation with Word Embeddings
    Antunes, Rui
    Matos, Sergio
    [J]. 11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2017, 616 : 273 - 279
  • [10] Overcoming Poor Word Embeddings with Word Definitions
    Malon, Christopher
    [J]. 10TH CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS (SEM 2021), 2021, : 288 - 293