Linked Document Embedding for Classification

被引:68
|
作者
Wang, Suhang [1 ]
Tang, Jiliang [2 ]
Aggarwal, Charu [3 ]
Liu, Huan [1 ]
机构
[1] Arizona State Univ, Comp Sci & Engn, Tempe, AZ 85281 USA
[2] Michigan State Univ, Comp Sci & Engn, E Lansing, MI 48824 USA
[3] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
关键词
Document Embedding; Linked Data; Word Embedding;
D O I
10.1145/2983323.2983755
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Word and document embedding algorithms such as Skip-gram and Paragraph Vector have been proven to help various text analysis tasks such as document classification, document clustering and information retrieval. The vast majority of these algorithms are designed to work with independent and identically distributed documents. However, in many real-world applications, documents are inherently linked. For example, web documents such as blogs and online news often have hyperlinks to other web documents, and scientific articles usually cite other articles. Linked documents present new challenges to traditional document embedding algorithms. In addition, most existing document embedding algorithms are unsupervised and their learned representations may not be optimal for classification when labeling information is available. In this paper, we study the problem of linked document embedding for classification and propose a linked document embedding framework LDE, which combines link and label information with content information to learn document representations for classification. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of link and label information in the proposed framework LDE.
引用
收藏
页码:115 / 124
页数:10
相关论文
共 50 条
  • [1] The Benefit of Document Embedding in Unsupervised Document Classification
    Novotny, Jaromir
    Ircing, Pavel
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 470 - 478
  • [2] Fine: Information embedding for document classification
    Carter, Kevin M.
    Raich, Raviv
    Hero, Alfred O., III
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1861 - +
  • [3] Orthogonal Locality Discriminant Embedding for Document Classification
    Wang, Ziqiang
    Sun, Xia
    [J]. 2009 FOURTH INTERNATIONAL CONFERENCE ON BIO-INSPIRED COMPUTING: THEORIES AND APPLICATIONS, PROCEEDINGS, 2009, : 170 - 174
  • [4] Document Sentiment Classification based on the Word Embedding
    Yin, Yanping
    Jin, Zhong
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 456 - 461
  • [5] An Embedding-Based Topic Model for Document Classification
    Seifollahi, Sattar
    Piccardi, Massimo
    Jolfaei, Alireza
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (03)
  • [6] Pyramidal Stochastic Graphlet Embedding for Document Pattern Classification
    Dutta, Anjan
    Riba, Pau
    Llados, Josep
    Fornes, Alicia
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 33 - 38
  • [7] Exploration of Document Classification with Linked Data and PageRank
    Dostal, Martin
    Nykl, Michal
    Jezek, Karel
    [J]. INTELLIGENT DISTRIBUTED COMPUTING VII, 2014, 511 : 37 - 43
  • [8] Document Embedding based Supervised Methods for Turkish Text Classification
    Celenli, Halil I.
    Ozturk, S. Talha
    Sahin, Gurkan
    Gerek, Aydin
    Ganiz, Murat C.
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 477 - 482
  • [9] Linked Data Triples Enhance Document Relevance Classification
    Nagumothu, Dinesh
    Eklund, Peter W.
    Ofoghi, Bahadorreza
    Bouadjenek, Mohamed Reda
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [10] A Rule-Based Approach to Embedding Techniques for Text Document Classification
    Aubaid, Asmaa M.
    Mishra, Alok
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (11):