Visual Exploration of Semantic Relationships in Neural Word Embeddings

被引:61
|
作者
Liu, Shusen [1 ]
Bremer, Peer-Timo [1 ]
Thiagarajan, Jayaraman J. [1 ]
Srikumar, Vivek [3 ]
Wang, Bei [2 ]
Livnat, Yarden [2 ]
Pascucci, Valerio [2 ]
机构
[1] Lawrence Livermore Natl Lab, Lawrence, CA 94550 USA
[2] Univ Utah, SCI Inst, Salt Lake City, UT 84112 USA
[3] Univ Utah, Sch Comp, Salt Lake City, UT 84112 USA
基金
美国国家科学基金会;
关键词
Natural Language Processing; Word Embedding; High-Dimensional Data; DIMENSIONALITY REDUCTION; VISUALIZATION; QUALITY;
D O I
10.1109/TVCG.2017.2745141
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Constructing distributed representations for words through neural language models and using the resulting vector spaces for analysis has become a crucial component of natural language processing (NLP). However, despite their widespread application, little is known about the structure and properties of these spaces. To gain insights into the relationship between words, the NLP community has begun to adapt high-dimensional visualization techniques. In particular, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) and principal component analysis (PCA) to create two-dimensional embeddings for assessing the overall structure and exploring linear relationships (e.g., word analogies), respectively. Unfortunately, these techniques often produce mediocre or even misleading results and cannot address domain-specific visualization challenges that are crucial for understanding semantic relationships in word embeddings. Here, we introduce new embedding techniques for visualizing semantic and syntactic analogies, and the corresponding tests to determine whether the resulting views capture salient structures. Additionally, we introduce two novel views for a comprehensive study of analogy relationships. Finally, we augment t-SNE embeddings to convey uncertainty information in order to allow a reliable interpretation. Combined, the different views address a number of domain-specific tasks difficult to solve with existing tools.
引用
收藏
页码:553 / 562
页数:10
相关论文
共 50 条
  • [21] Dense Embeddings Preserving the Semantic Relationships in WordNet
    Zhang, Canlin
    Liu, Xiuwen
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [22] Semantic Equivalence in Birth Stories: Application of Word Embeddings
    Bubenhofer, Noah
    [J]. ZEITSCHRIFT FUR GERMANISTISCHE LINGUISTIK, 2020, 48 (03): : 562 - 589
  • [23] Semantic Comparison of Driving Sequences by Adaptation of Word Embeddings
    Ries, Lennart
    Stumpf, Maximilian
    Bach, Johannes
    Sax, Eric
    [J]. 2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [24] Exploring Implicit Semantic Constraints for Bilingual Word Embeddings
    Jinsong Su
    Zhenqiao Song
    Yaojie Lu
    Mu Xu
    Changxing Wu
    Yidong Chen
    [J]. Neural Processing Letters, 2018, 48 : 1073 - 1088
  • [25] Short texts semantic similarity based on word embeddings
    Babic, Karlo
    Martincic-Ipsic, Sanda
    Mestrovic, Ana
    Guerra, Francesco
    [J]. CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS (CECIIS 2019), 2019, : 27 - 33
  • [26] Improved Learning of Chinese Word Embeddings with Semantic Knowledge
    Yang, Liner
    Sun, Maosong
    [J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 15 - 25
  • [27] Exploring Implicit Semantic Constraints for Bilingual Word Embeddings
    Su, Jinsong
    Song, Zhenqiao
    Lu, Yaojie
    Xu, Mu
    Wu, Changxing
    Chen, Yidong
    [J]. NEURAL PROCESSING LETTERS, 2018, 48 (02) : 1073 - 1088
  • [28] DEEP WORD EMBEDDINGS FOR VISUAL SPEECH RECOGNITION
    Stafylakis, Themos
    Tzimiropoulos, Georgios
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4974 - 4978
  • [29] Enriching Portuguese Word Embeddings with Visual Information
    Consoli, Bernardo Scapini
    Vieira, Renata
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 435 - 440
  • [30] Gated Recurrent Capsules for Visual Word Embeddings
    Francis, Danny
    Huet, Benoit
    Merialdo, Bernard
    [J]. MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 278 - 290