A Comparative Study of Methods for Visualizable Semantic Embedding of Small Text Corpora

被引:3
|
作者
Choudhary, Rishabh [1 ]
Doboli, Simona [2 ]
Minai, Ali A. [1 ]
机构
[1] Univ Cincinnati, Dept Elect Engn & Comp Sci, Cincinnati, OH 45221 USA
[2] Hofstra Univ, Dept Comp Sci, Hempstead, NY 11550 USA
关键词
semantic spaces; text embedding; language models; semantic visualization; REPRESENTATIONS; BRAIN;
D O I
10.1109/IJCNN52387.2021.9534250
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text embedding has recently emerged as a very useful and successful method for semantic representation. Following initial word-level embedding methods such as Latent Semantic Analysis (LSA) and topic-based bag-of-words approaches like Latent Dirichlet Allocation (LDA), the focus has turned to language models and text encoders implemented as neural networks - ranging from word-level models to those embedding whole documents. The distinctive feature of these models is their ability to infer semantic spaces at all levels based purely on data, with no need for complexities such as syntactic analysis or ontology building. Many of these models are available pretrained on enormous amounts of data, providing downstream applications with general-purpose semantic spaces. In particular, embedding models at the sentence level or higher are most useful in applications because the meaning of text only becomes clear at that level. Most text embedding methods produce text embeddings in high-dimensional spaces, with a dimensionality ranging from a few hundred to thousands. However, it is often useful to visualize semantic spaces in very low dimension, which requires the use of dimensionality reduction methods. It is not clear what language models and what method of dimensionality reduction would work well in these cases. In this paper, we compare four text embedding methods in combination with three methods of dimensionality reduction to map three related real-world datasets comprising textual descriptions of items in a particular domain (sports) to a 2-dimensional semantic visualization space. The results provide several insights into the utility of these methods for data of this type.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] A Comparative Study of the Impact of Statistical and Semantic Features in the Framework of Extractive Text Summarization
    Vodolazova, Tatiana
    Lloret, Elena
    Munoz, Rafael
    Palomar, Manuel
    TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 306 - 313
  • [32] Review of Text Neural Semantic Parsing Methods
    Shen, Lingyun
    Le, Xiaoqiu
    Data Analysis and Knowledge Discovery, 2023, 7 (12) : 1 - 21
  • [33] SemanticGraph2Vec: Semantic graph embedding for text representation
    Etaiwi, Wael
    Awajan, Arafat
    ARRAY, 2023, 17
  • [34] Multi-label text classification model based on semantic embedding
    Yan Danfeng
    Ke Nan
    Gu Chao
    Cui Jianfei
    Ding Yiqi
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2019, 26 (01) : 95 - 104
  • [35] STMAP: A novel semantic text matching model augmented with embedding perturbations
    Wang, Yanhao
    Zhang, Baohua
    Liu, Weikang
    Cai, Jiahao
    Zhang, Huaping
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [36] Integrating Text Classification into Topic Discovery Using Semantic Embedding Models
    Lezama-Sanchez, Ana Laura
    Vidal, Mireya Tovar
    Reyes-Ortiz, Jose A.
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [37] Short Text Embedding for Clustering based on Word and Topic Semantic Information
    Chen, Ziheng
    Ren, Jiangtao
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 61 - 70
  • [38] Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text
    Jaiswal, Ayush
    Sabir, Ekraam
    AbdAlmageed, Wael
    Natarajan, Premkumar
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1465 - 1471
  • [39] SSP: Semantic Space Projection for Knowledge Graph Embedding with Text Descriptions
    Xiao, Han
    Huang, Minlie
    Meng, Lian
    Zhu, Xiaoyan
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3104 - 3110
  • [40] Comparative Study of Word Embedding Methods in Biomedical Named Entities Recognition
    Derbel, Houssemeddine
    Habacha Chaibi, Anja
    Benabdelkader, Chiraz
    Hajjami Ben Ghezala, Henda
    VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 6356 - 6367