Beyond word embeddings: A survey

被引:7
|
作者
Incitti, Francesca [1 ]
Urli, Federico [1 ]
Snidaro, Lauro [1 ]
机构
[1] Univ Udine, Dept Math Comp Sci & Phys, Udine, Italy
关键词
NLP; Text representation; Document embeddings; Sentence embeddings; Transformers; Multimodal embeddings; DOCUMENT CLASSIFICATION; TEXT; REPRESENTATIONS; KNOWLEDGE; NETWORKS;
D O I
10.1016/j.inffus.2022.08.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of this paper is to provide an overview of the methods that allow text representations with a focus on embeddings for text of different lengths, specifically on works that go beyond word embeddings. Analyzing pieces of text can be more challenging in comparison to the analysis of single words, because several additional factors come into play. For this reason, representations of longer pieces of text can be obtained with different strategies, leveraging additional information with respect to what is done for single words. A text is defined by its components and how these are combined together, and this should be taken into account when integrating information to obtain a single document embedding. In addition, multimodal approaches are described to show how it is possible to fuse information of different nature (aural, visual and knowledge) in order to obtain enriched representations. The aim of this survey is to help navigate through the existing methods proposed in the literature and understand which strategies are most suitable to specific needs.
引用
收藏
页码:418 / 436
页数:19
相关论文
共 50 条
  • [1] A survey on training and evaluation of word embeddings
    Torregrossa, Francois
    Allesiardo, Robin
    Claveau, Vincent
    Kooli, Nihel
    Gravier, Guillaume
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2021, 11 (02) : 85 - 103
  • [2] A survey on training and evaluation of word embeddings
    François Torregrossa
    Robin Allesiardo
    Vincent Claveau
    Nihel Kooli
    Guillaume Gravier
    [J]. International Journal of Data Science and Analytics, 2021, 11 : 85 - 103
  • [3] A survey of word embeddings for clinical text
    Khattak, Faiza Khan
    Jeblee, Serena
    Pou-Prom, Chloé
    Abdalla, Mohamed
    Meaney, Christopher
    Rudzicz, Frank
    [J]. Journal of Biomedical Informatics: X, 2019, 4
  • [4] A survey of word embeddings based on deep learning
    Shirui Wang
    Wenan Zhou
    Chao Jiang
    [J]. Computing, 2020, 102 : 717 - 740
  • [5] A survey of word embeddings based on deep learning
    Wang, Shirui
    Zhou, Wenan
    Jiang, Chao
    [J]. COMPUTING, 2020, 102 (03) : 717 - 740
  • [6] Word embeddings for biomedical natural language processing: A survey
    Chiu, Billy
    Baker, Simon
    [J]. LANGUAGE AND LINGUISTICS COMPASS, 2020, 14 (12):
  • [7] Continuous-Space Language Processing: Beyond Word Embeddings
    Ostendorf, Mari
    [J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2016, 2016, 9918 : 3 - 15
  • [8] From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
    Camacho-Collados, Jose
    Pilehvar, Mohammad Taher
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 63 : 743 - 788
  • [9] Beyond Word Embeddings: Temporal Representations of Words using Google Trends
    Haque, Md Enamul
    Maiti, Aniruddha
    Tozal, Mehmet Engin
    [J]. 2021 IEEE 15TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2021), 2021, : 280 - 287
  • [10] Beyond word2vec: Distance-graph Tensor Factorization for Word and Document Embeddings
    Wang, Suhang
    Aggarwal, Charu
    Liu, Huan
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1041 - 1050