Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences

被引:1
|
作者
An, Yuan [1 ]
Kalinowski, Alexander [1 ]
Greenberg, Jane [1 ]
机构
[1] Drexel Univ, Coll Comp & Informat, Metadata Res Ctr, Philadelphia, PA 19104 USA
来源
2021 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT DATA SCIENCE TECHNOLOGIES AND APPLICATIONS (IDSTA) | 2021年
关键词
Sentence Embedding; Embedding Space Analysis; Clustering Analysis; Network Analysis;
D O I
10.1109/IDSTA53674.2021.9660801
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentence embedding methods offer a powerful approach for working with short textual constructs or sequences of words. By representing sentences as dense numerical vectors, many natural language processing (NLP) applications have improved their performance. However, relatively little is understood about the latent structure of sentence embeddings. Specifically, research has not addressed whether the length and structure of sentences impact the sentence embedding space and topology. This paper reports research on a set of comprehensive clustering and network analyses targeting sentence and sub-sentence embedding spaces. Results show that one method generates the most clusterable embeddings. In general, the embeddings of span sub-sentences have better clustering properties than the original sentences. The results have implications for future sentence embedding models and applications.
引用
收藏
页码:138 / 145
页数:8
相关论文
共 50 条
  • [21] On the Analysis of Existential Sentences
    Morscher, Edgar
    ZEITSCHRIFT FUR PHILOSOPHISCHE FORSCHUNG, 2011, 65 (03): : 403 - 415
  • [22] Clustering of Chinese sentences using the SMM model
    Du, Tiansang
    Xu, Xinying
    Chen, Liang
    Chang, Baobao
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 491 - +
  • [23] Clustering sentences for discovering events in news articles
    Naughton, Martina
    Kushmerick, Nicholas
    Carthy, Joe
    ADVANCES IN INFORMATION RETRIEVAL, 2006, 3936 : 535 - 538
  • [24] A spectral analysis approach to document summarization: Clustering and ranking sentences simultaneously
    Cai, Xiaoyan
    Li, Wenjie
    INFORMATION SCIENCES, 2011, 181 (18) : 3816 - 3827
  • [25] A neural graph embedding approach for selecting review sentences
    Pourgholamali, Fatemeh
    Kahani, Mohsen
    Bagheri, Ebrahim
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2020, 40
  • [26] Sentiment analysis for Bangla sentences using convolutional neural network
    Alam, Md. Habibul
    Rahoman, Md-Mizanur
    Azad, Md. Abul Kalam
    20th International Conference of Computer and Information Technology, ICCIT 2017, 2017, 2018-January : 1 - 6
  • [27] Sentiment Analysis for Bangla Sentences using Convolutional Neural Network
    Alam, Md. Habibul
    Rahoman, Md-Mizanur
    Azad, Md. Abul Kalam
    2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,
  • [28] A Convolutional Neural Network for Modelling Sentences
    Kalchbrenner, Nal
    Grefenstette, Edward
    Blunsom, Phil
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 655 - 665
  • [29] AN ANALYSIS OF BELIEF-SENTENCES
    MECKLER, L
    PHILOSOPHY AND PHENOMENOLOGICAL RESEARCH, 1956, 16 (03) : 317 - 330
  • [30] THE ANALYSIS OF ENGLISH CLEFT SENTENCES
    DELAHUNTY, GP
    LINGUISTIC ANALYSIS, 1984, 13 (02): : 63 - 113