Generating Bags of Words from the Sums of Their Word Embeddings

被引:3
|
作者
White, Lyndon [1 ]
Togneri, Roberto [1 ]
Liu, Wei [1 ]
Bennamoun, Mohammed [1 ]
机构
[1] Univ Western Australia, 35 Stirling Highway, Crawley, WA, Australia
基金
澳大利亚研究理事会;
关键词
D O I
10.1007/978-3-319-75477-2_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many methods have been proposed to generate sentence vector representations, such as recursive neural networks, latent distributed memory models, and the simple sum of word embeddings (SOWE). However, very few methods demonstrate the ability to reverse the process - recovering sentences from sentence embeddings. Amongst the many sentence embeddings, SOWE has been shown to maintain semantic meaning, so in this paper we introduce a method for moving from the SOWE representations back to the bag of words (BOW) for the original sentences. This is a partway step towards recovering the whole sentence and has useful theoretical and practical applications of its own. This is done using a greedy algorithm to convert the vector to a bag of words. To our knowledge this is the first such work. It demonstrates qualitatively the ability to recreate the words from a large corpus based on its sentence embeddings. As well as practical applications for allowing classical information retrievalmethods to be combined with more recent methods using the sums ofwordembeddings, the success of this method has theoretical implications on the degree of information maintained by the sum of embeddings representation. This lends some credence to the consideration of the SOWE as a dimensionality reduced, and meaning enhanced, datamanifold for the bag of words.
引用
收藏
页码:91 / 102
页数:12
相关论文
共 50 条
  • [1] More than Bags of Words: Sentiment Analysis with Word Embeddings
    Rudkowsky, Elena
    Haselmayer, Martin
    Wastian, Matthias
    Jenny, Marcelo
    Emrich, Stefan
    Sedlmair, Michael
    [J]. COMMUNICATION METHODS AND MEASURES, 2018, 12 (2-3) : 140 - 157
  • [2] Word Embeddings of Monosemous Words in Dictionary for Word Sense Disambiguation
    Sasaki, Minoru
    [J]. SEMAPRO 2018: THE TWELFTH INTERNATIONAL CONFERENCE ON ADVANCES IN SEMANTIC PROCESSING, 2018, : 4 - 7
  • [3] Analysis of The Characteristics of Similar Words Computed by Word Embeddings
    Zhou, Shuhui
    Liu, Peihan
    Liu, Lizhen
    Song, Wei
    Cheng, Miaomiao
    [J]. PROCEEDINGS OF 2020 IEEE 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2020), 2020, : 327 - 330
  • [4] Improving interpretability of word embeddings by generating definition and usage
    Zhang, Haitong
    Du, Yongping
    Sun, Jiaxin
    Li, Qingxiao
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 160 (160)
  • [5] Improving Word Embeddings for Low Frequency Words by Pseudo Contexts
    Li, Fang
    Wang, Xiaojie
    [J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 37 - 47
  • [6] New Word Analogy Corpus for Exploring Embeddings of Czech Words
    Svoboda, Lukas
    Brychcin, Tomas
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 103 - 114
  • [7] Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks
    Lauren, Paula
    Qu, Guangzhi
    Yang, Jucheng
    Watta, Paul
    Huang, Guang-Bin
    Lendasse, Amaury
    [J]. COGNITIVE COMPUTATION, 2018, 10 (04) : 625 - 638
  • [8] Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks
    Paula Lauren
    Guangzhi Qu
    Jucheng Yang
    Paul Watta
    Guang-Bin Huang
    Amaury Lendasse
    [J]. Cognitive Computation, 2018, 10 : 625 - 638
  • [9] Beyond Word Embeddings: Temporal Representations of Words using Google Trends
    Haque, Md Enamul
    Maiti, Aniruddha
    Tozal, Mehmet Engin
    [J]. 2021 IEEE 15TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2021), 2021, : 280 - 287
  • [10] Learning Chinese Word Embeddings With Words and Subcharacter N-Grams
    Kang, Ruizhi
    Zhang, Hongjun
    Hao, Wenning
    Cheng, Kai
    Zhang, Guanglu
    [J]. IEEE ACCESS, 2019, 7 : 42987 - 42992