Tens-embedding: A Tensor-based document embedding method

被引:7
|
作者
Rahimi, Zahra [1 ]
Homayounpour, Mohammad Mehdi [1 ]
机构
[1] Amirkabir Univ Technol, Dept Comp Engn & Informat Technol, 350 Hafez Ave,Valiasr Sq, Tehran, Iran
基金
美国国家科学基金会;
关键词
Natural language processing; Text classification; Text representation; Document embeddings; Tensor factorization; Topic modeling;
D O I
10.1016/j.eswa.2020.113770
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A human is capable of understanding and classifying a text but a computer can understand the underlying semantics of a text when texts are represented in a way comprehensible by computers. The text representation is a fundamental stage in natural language processing (NLP). One of the main drawbacks of existing text representation approaches is that they only utilize one aspect or view of a text e.g. They only consider texts by their words while the topic information can be extracted from text as well. The term-document and document-topic matrix are two views of a text and contain complementary information. We use the strength of both views to extract a richer representation. In this paper, we propose three different text representation methods with the help of these two matrices and tensor factorization to utilize the power of both views. The proposed approach (Tens-Embedding) was applied in the tasks of text classification, sentence-level and document-level sentiment analysis and text clustering wherein the conducted experiments on 20 news groups, R52, R8, MR and IMDB datasets indicated the superiority of the proposed method in comparison with other document embedding techniques. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Discriminant Tensor-Based Manifold Embedding for Medical Hyperspectral Imagery
    Lv, Meng
    Li, Wei
    Chen, Tianhong
    Zhou, Jun
    Tao, Ran
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (09) : 3517 - 3528
  • [2] Tensor-based embedding for graph-based semi-supervised approaches
    Ioannis, Georgoulas
    Eftychios, Protopapadakis
    Konstantinos, Makantasis
    Anastasios, Doulamis
    [J]. PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2023, 2023, : 632 - 637
  • [3] Patch Tensor-Based Multigraph Embedding Framework for Dimensionality Reduction of Hyperspectral Images
    Deng, Yang-Jun
    Li, Heng-Chao
    Song, Xin
    Sun, Yong-Jinn
    Zhang, Xiang-Rong
    Du, Qian
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (03): : 1630 - 1643
  • [4] ROBUST PATCH TENSOR-BASED MULTIGRAPH EMBEDDING FOR DIMENSIONALITY REDUCTION OF HYPERSPECTRAL IMAGES
    Deng, Yang-Jun
    Zhou, Yi
    Wang, Wei-Ye
    Zhu, Xing-Hui
    Li, Heng-Chao
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1149 - 1152
  • [5] TensSent: a tensor based sentimental word embedding method
    Rahimi, Zahra
    Homayounpour, Mohammad Mehdi
    [J]. APPLIED INTELLIGENCE, 2021, 51 (08) : 6056 - 6071
  • [6] TensSent: a tensor based sentimental word embedding method
    Zahra Rahimi
    Mohammad Mehdi Homayounpour
    [J]. Applied Intelligence, 2021, 51 : 6056 - 6071
  • [7] GAE-Based Document Embedding Method for Clustering
    Jung, Sungwon
    Ka, Sangmin
    [J]. IEEE ACCESS, 2022, 10 : 130089 - 130096
  • [8] Semi-Supervised Tensor-Based Graph Embedding Learning and Its Application to Visual Discriminant Tracking
    Hu, Weiming
    Gao, Jin
    Xing, Junliang
    Zhang, Chao
    Maybank, Stephen
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (01) : 172 - 188
  • [9] A Document Similarity Computation Method Based on Word Embedding and Citation Analysis
    Lamiya, K.
    Mohan, Anuraj
    [J]. RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 3, 2018, 709 : 161 - 168
  • [10] A Weighted Topical Document Embedding based Clustering Method for News Text
    Zhu Dechao
    Song Hui
    [J]. 2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 1060 - 1065