Enhancing Domain Word Embedding via Latent Semantic Imputation

被引:9
|
作者
Yao, Shibo [1 ]
Yu, Dantong [1 ]
Xiao, Keli [2 ]
机构
[1] New Jersey Inst Technol, Newark, NJ 07102 USA
[2] SUNY Stony Brook, Stony Brook, NY 11794 USA
关键词
representation learning; graph; manifold learning; spectral methods; DIMENSIONALITY REDUCTION;
D O I
10.1145/3292500.3330926
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a novel method named Latent Semantic Imputation (LSI) to transfer external knowledge into semantic space for enhancing word embedding. The method integrates graph theory to extract the latent manifold structure of the entities in the affinity space and leverages non-negative least squares with standard simplex constraints and power iteration method to derive spectral embeddings. It provides an effective and efficient approach to combining entity representations defined in different Euclidean spaces. Specifically, our approach generates and imputes reliable embedding vectors for low-frequency words in the semantic space and benefits downstream language tasks that depend on word embedding. We conduct comprehensive experiments on a carefully designed classification problem and language modeling and demonstrate the superiority of the enhanced embedding via LSI over several well-known benchmark embeddings. We also confirm the consistency of the results under different parameter settings of our method.
引用
收藏
页码:557 / 565
页数:9
相关论文
共 50 条
  • [41] Combining Word Embedding and Lexical Database for Semantic Relatedness Measurement
    Lee, Yang-Yin
    Ke, Hao
    Huang, Hen-Hsen
    Chen, Hsin-Hsi
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 73 - 74
  • [42] Study on Semantic Transparency of Chinese Compounds Based on Word Embedding
    Tang, Xuemei
    Liang, Shichen
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 130 - 134
  • [43] Predicting Gene Functional Interactions Using Semantic Word Embedding
    Roy, Arpita
    Pan, Shimei
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 509 - 513
  • [44] Math-word embedding in math search and semantic extraction
    André Greiner-Petter
    Abdou Youssef
    Terry Ruas
    Bruce R. Miller
    Moritz Schubotz
    Akiko Aizawa
    Bela Gipp
    Scientometrics, 2020, 125 : 3017 - 3046
  • [45] Modelling the Semantic Change Dynamics using Diachronic Word Embedding
    Boukhaled, Mohamed Amine
    Fagard, Benjamin
    Poibeau, Thierry
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 944 - 951
  • [46] The Semantic Similarity Relation of Entities Discovery: Using Word Embedding
    Ruan, Dong-ru
    Mao, Yu-xin
    Pan, Hong-yan
    Gao, Kai
    2017 9TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION AND CONTROL (ICMIC 2017), 2017, : 845 - 850
  • [47] Math-word embedding in math search and semantic extraction
    Greiner-Petter, Andre
    Youssef, Abdou
    Ruas, Terry
    Miller, Bruce R.
    Schubotz, Moritz
    Aizawa, Akiko
    Gipp, Bela
    SCIENTOMETRICS, 2020, 125 (03) : 3017 - 3046
  • [48] Word Embedding based Textual Semantic Similarity Measure in Bengali
    Iqbal, Md Asif
    Sharif, Omar
    Hoque, Mohammed Moshiul
    Sarker, Iqbal H.
    10TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE IN COMPUTATIONAL SCIENCE (YSC2021), 2021, 193 : 92 - 101
  • [49] Semantic Similarity of Inverse Morpheme Words Based on Word Embedding
    Zhou, Jiaomei
    Liu, Zhiying
    CHINESE LEXICAL SEMANTICS, CLSW 2021, PT I, 2022, 13249 : 452 - 463
  • [50] A survey on word embedding techniques and semantic similarity for paraphrase identification
    Kubal, Divesh R.
    Nimkar, Anant V.
    International Journal of Computational Systems Engineering, 2019, 5 (01) : 36 - 52