Enhancing Domain Word Embedding via Latent Semantic Imputation

被引:9
|
作者
Yao, Shibo [1 ]
Yu, Dantong [1 ]
Xiao, Keli [2 ]
机构
[1] New Jersey Inst Technol, Newark, NJ 07102 USA
[2] SUNY Stony Brook, Stony Brook, NY 11794 USA
关键词
representation learning; graph; manifold learning; spectral methods; DIMENSIONALITY REDUCTION;
D O I
10.1145/3292500.3330926
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a novel method named Latent Semantic Imputation (LSI) to transfer external knowledge into semantic space for enhancing word embedding. The method integrates graph theory to extract the latent manifold structure of the entities in the affinity space and leverages non-negative least squares with standard simplex constraints and power iteration method to derive spectral embeddings. It provides an effective and efficient approach to combining entity representations defined in different Euclidean spaces. Specifically, our approach generates and imputes reliable embedding vectors for low-frequency words in the semantic space and benefits downstream language tasks that depend on word embedding. We conduct comprehensive experiments on a carefully designed classification problem and language modeling and demonstrate the superiority of the enhanced embedding via LSI over several well-known benchmark embeddings. We also confirm the consistency of the results under different parameter settings of our method.
引用
收藏
页码:557 / 565
页数:9
相关论文
共 50 条
  • [21] Semantic word shifts in a scientific domain
    Baitong Chen
    Ying Ding
    Feicheng Ma
    Scientometrics, 2018, 117 : 211 - 226
  • [22] Word Embedding Evaluation in Downstream Tasks and Semantic Analogies
    Santos, Joaquim
    Consoli, Bernardo
    Vieira, Renata
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4828 - 4834
  • [23] Understanding the semantic change of Hangeul using word embedding
    Sun, Hyunseok
    Lee, Yung-Seop
    Lim, Changwon
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (03) : 295 - 308
  • [24] Improving word and Sense Embedding with Hierarchical Semantic Relations
    Shiue, Yow-Ting
    Ma, Wei-Yun
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 350 - 353
  • [25] Patent expanded retrieval via word embedding under composite-domain perspectives
    Fei Wang
    Tieyun Qian
    Bin Liu
    Zhiyong Peng
    Frontiers of Computer Science, 2019, 13 : 1048 - 1061
  • [26] Patent expanded retrieval via word embedding under composite-domain perspectives
    Wang, Fei
    Qian, Tieyun
    Liu, Bin
    Peng, Zhiyong
    FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 1048 - 1061
  • [27] Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization
    He, Hai
    Yang, Haibo
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [28] Incorporating Domain Knowledge in Learning Word Embedding
    Roy, Arpita
    Park, Youngja
    Pan, Shimei
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1568 - 1573
  • [29] Short Text Clustering based on Word Semantic Graph with Word Embedding Model
    Jinarat, Supakpong
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1427 - 1432
  • [30] Enhancing Word Embeddings for Improved Semantic Alignment
    Szymanski, Julian
    Operlejn, Maksymilian
    Weichbroth, Pawel
    APPLIED SCIENCES-BASEL, 2024, 14 (24):