Probabilistic Latent Document Network Embedding

被引:49
|
作者
Le, Tuan M. V. [1 ]
Lauw, Hady W. [1 ]
机构
[1] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore
关键词
document network; embedding; visualization; topic modeling; generative model; dimensionality reduction;
D O I
10.1109/ICDM.2014.119
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A document network refers to a data type that can be represented as a graph of vertices, where each vertex is associated with a text document. Examples of such a data type include hyperlinked Web pages, academic publications with citations, and user profiles in social networks. Such data have very high-dimensional representations, in terms of text as well as network connectivity. In this paper, we study the problem of embedding, or finding a low-dimensional representation of a document network that "preserves" the data as much as possible. These embedded representations are useful for various applications driven by dimensionality reduction, such as visualization or feature selection. While previous works in embedding have mostly focused on either the textual aspect or the network aspect, we advocate a holistic approach by finding a unified low-rank representation for both aspects. Moreover, to lend semantic interpretability to the low-rank representation, we further propose to integrate topic modeling and embedding within a joint model. The gist is to join the various representations of a document (words, links, topics, and coordinates) within a generative model, and to estimate the hidden representations through MAP estimation. We validate our model on real-life document networks, showing that it outperforms comparable baselines comprehensively on objective evaluation metrics.
引用
收藏
页码:270 / 279
页数:10
相关论文
共 50 条
  • [1] Probabilistic Latent Network Visualization: Inferring and Embedding Diffusion Networks
    Kurashima, Takeshi
    Iwata, Tomoharu
    Takaya, Noriko
    Sawada, Hiroshi
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1236 - 1245
  • [2] Probabilistic Structural Latent Representation for Unsupervised Embedding
    Ye, Mang
    Shen, Jianbing
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5456 - 5465
  • [3] Latent Network Summarization: Bridging Network Embedding and Summarization
    Jin, Di
    Rossi, Ryan A.
    Koh, Eunyee
    Kim, Sungchul
    Rao, Anup
    Koutra, Danai
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 987 - 997
  • [4] Latent Graph Recurrent Network for Document Ranking
    Dong, Qian
    Niu, Shuzi
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 88 - 103
  • [5] Chinese spoken document summarization using probabilistic latent topical information
    Chen, Berlin
    Yeh, Yao-Ming
    Huang, Yao-Min
    Chen, Yi-Ting
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 969 - 972
  • [6] Probabilistic Neural Network and Word Embedding for Sentiment Analysis
    Alam, Saqib
    Yao, Nianmin
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (07) : 48 - 53
  • [7] Adversarial Robustness of Probabilistic Network Embedding for Link Prediction
    Chen, Xi
    Kang, Bo
    Lijffijt, Jefrey
    Bie, Tijl De
    [J]. MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2021, 1525 : 22 - 38
  • [8] LinNet: Probabilistic Lineup Evaluation Through Network Embedding
    Pelechrinis, Konstantinos
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT III, 2019, 11053 : 20 - 36
  • [9] Improved spoken document summarization using Probabilistic Latent Semantic Analysis (PLSA)
    Kong, Sheng-Yi
    Lee, Lin-shan
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 941 - 944
  • [10] Identification of Latent Oncogenes with a Network Embedding Method and Random Forest
    Zhao, Ran
    Hu, Bin
    Chen, Lei
    Zhou, Bo
    [J]. BIOMED RESEARCH INTERNATIONAL, 2020, 2020