Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation

被引:13
|
作者
Wei, Chao [1 ]
Luo, Senlin [1 ]
Ma, Xincheng [1 ]
Ren, Hao [1 ]
Zhang, Ji [1 ]
Pan, Limin [1 ]
机构
[1] Beijing Inst Technol, Beijing 10081, Peoples R China
来源
PLOS ONE | 2016年 / 11卷 / 01期
关键词
NETWORK;
D O I
10.1371/journal.pone.0146672
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.
引用
下载
收藏
页数:20
相关论文
共 50 条
  • [21] Semi-supervised learning via manifold regularization
    MAO Yu
    ZHOU Yan-quan
    LI Rui-fan
    WANG Xiao-jie
    ZHONG Yi-xin
    The Journal of China Universities of Posts and Telecommunications, 2012, (06) : 79 - 88
  • [22] MANIFOLD REGULARIZATION FOR SEMI-SUPERVISED SEQUENTIAL LEARNING
    Moh, Yvonne
    Buhmann, Joachim M.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1617 - 1620
  • [23] Manifold Correlation Graph for Semi-Supervised Learning
    Valem, Lucas Pascotti
    Pedronette, Daniel C. G.
    Breve, Fabricio
    Guilherme, Ivan Rizzo
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [24] Spectral methods for semi-supervised manifold learning
    Zhang, Zhenyue
    Zha, Hongyuan
    Zhang, Min
    2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 311 - +
  • [25] Semi-supervised learning via manifold regularization
    Mao, Yu
    Zhou, Yan-Quan
    Li, Rui-Fan
    Wang, Xiao-Jie
    Zhong, Yi-Xin
    Journal of China Universities of Posts and Telecommunications, 2012, 19 (06): : 79 - 88
  • [26] Semi-supervised classification of multiple kernels embedding manifold information
    Tao Yang
    Dongmei Fu
    Xiaogang Li
    Cluster Computing, 2017, 20 : 3417 - 3426
  • [27] Semi-supervised learning via manifold regularization
    MAO Yu
    ZHOU Yan-quan
    LI Rui-fan
    WANG Xiao-jie
    ZHONG Yi-xin
    The Journal of China Universities of Posts and Telecommunications, 2012, 19 (06) : 79 - 88
  • [28] Pointwise manifold regularization for semi-supervised learning
    Yunyun WANG
    Jiao HAN
    Yating SHEN
    Hui XUE
    Frontiers of Computer Science, 2021, (01) : 76 - 83
  • [29] Learning Semi-Supervised Representation Towards a Unified Optimization Framework for Semi-Supervised Learning
    Li, Chun-Guang
    Lin, Zhouchen
    Zhang, Honggang
    Guo, Jun
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2767 - 2775
  • [30] Semi-supervised manifold alignment with multi-graph embedding
    Huang Chang-Bin
    Abeo, Timothy Apasiba
    Luo Xiao-Zhen
    Shen Xiang-Jun
    Gou Jian-Ping
    Niu De-Jiao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (27-28) : 20241 - 20262