Tens-embedding: A Tensor-based document embedding method

被引:7
|
作者
Rahimi, Zahra [1 ]
Homayounpour, Mohammad Mehdi [1 ]
机构
[1] Amirkabir Univ Technol, Dept Comp Engn & Informat Technol, 350 Hafez Ave,Valiasr Sq, Tehran, Iran
基金
美国国家科学基金会;
关键词
Natural language processing; Text classification; Text representation; Document embeddings; Tensor factorization; Topic modeling;
D O I
10.1016/j.eswa.2020.113770
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A human is capable of understanding and classifying a text but a computer can understand the underlying semantics of a text when texts are represented in a way comprehensible by computers. The text representation is a fundamental stage in natural language processing (NLP). One of the main drawbacks of existing text representation approaches is that they only utilize one aspect or view of a text e.g. They only consider texts by their words while the topic information can be extracted from text as well. The term-document and document-topic matrix are two views of a text and contain complementary information. We use the strength of both views to extract a richer representation. In this paper, we propose three different text representation methods with the help of these two matrices and tensor factorization to utilize the power of both views. The proposed approach (Tens-Embedding) was applied in the tasks of text classification, sentence-level and document-level sentiment analysis and text clustering wherein the conducted experiments on 20 news groups, R52, R8, MR and IMDB datasets indicated the superiority of the proposed method in comparison with other document embedding techniques. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] A word based self-embedding scheme for document watermark
    Khen, Thien Vui
    Makur, Anamitra
    [J]. TENCON 2006 - 2006 IEEE REGION 10 CONFERENCE, VOLS 1-4, 2006, : 906 - +
  • [42] A Tensor-Based Method for Completion of Missing Electromyography Data
    Akmal, Muhammad
    Zubair, Syed
    Jochumsen, Mads
    Kamavuako, Ernest Nlandu
    Niazi, Imran Khan
    [J]. IEEE ACCESS, 2019, 7 : 104710 - 104720
  • [43] Tensor-Based Source Localization Method with EVS Array
    Guanjun Huang
    Yongquan Li
    Zijing Zhang
    Junpeng Shi
    Fangqing Wen
    [J]. Journal of Beijing Institute of Technology, 2021, 30 (04) : 352 - 362
  • [44] A tensor-based method for missing traffic data completion
    Tan, Huachun
    Feng, Guangdong
    Feng, Jianshuai
    Wang, Wuhong
    Zhang, Yu-Jin
    Li, Feng
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2013, 28 : 15 - 27
  • [45] Heterogeneous hypergraph embedding for document recommendation
    Zhu, Yu
    Guan, Ziyu
    Tan, Shulong
    Liu, Haifeng
    Cai, Deng
    He, Xiaofei
    [J]. NEUROCOMPUTING, 2016, 216 : 150 - 162
  • [46] Probabilistic Latent Document Network Embedding
    Le, Tuan M. V.
    Lauw, Hady W.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 270 - 279
  • [47] Fine: Information embedding for document classification
    Carter, Kevin M.
    Raich, Raviv
    Hero, Alfred O., III
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1861 - +
  • [48] Tensor-based Dinkelbach method for computing generalized tensor eigenvalues and its applications
    Chen, Haibin
    Zhu, Wenqi
    Cartis, Coralia
    [J]. arXiv,
  • [49] Word Mover's Embedding: From Word2Vec to Document Embedding
    Wu, Lingfei
    Yen, Ian En-Hsu
    Xu, Kun
    Xu, Fangli
    Balakrishnan, Avinash
    Chen, Pin-Yu
    Ravikumar, Pradeep
    Witbrock, Michael J.
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4524 - 4534
  • [50] An embedding method in image based on visual redundancy
    Xiaoyan, Qiao
    Ji, Guangong
    Liang, Hui
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2007, : 2969 - +