Probabilistic document correlation model

被引:2
|
作者
Jia, Xiping [1 ]
Peng, Hong [1 ]
机构
[1] S China Univ Technol, Sch Engn & Comp Sci, Guangzhou 510640, Peoples R China
关键词
D O I
10.1109/CIS.Workshops.2007.65
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vector Space Model (VSM) and related models are popular in document relationship analysis in text mining recently. However, they are failed to discover the document correlation from topic level. This paper proposes a probabilistic document correlation model (PDC) to capture the document correlation based on topics. The PDC model defines the document correlation by the posterior probability of documents. And the posterior probability of each document is resolved through introducing the posterior probability of topics and topic similarity. Latent Dirichlet Allocation (LDA), a generative topic model, is used for topic retrieval in this paper. Experiments on correlated document search show that the PDC model outperforms the VSM in average retrieval precision and document compressing.
引用
收藏
页码:433 / 436
页数:4
相关论文
共 50 条
  • [1] Probabilistic Document Model for Automated Document Composition
    Damera-Venkata, Niranjan
    Bento, Jose
    O'Brien-Strain, Eamonn
    [J]. DOCENG 2011: PROCEEDINGS OF THE 2011 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2011, : 3 - 12
  • [2] DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL
    Wang, Shuguang
    Visweswaran, Shyam
    Hauskrecht, Milos
    [J]. KDIR 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2009, : 26 - +
  • [3] A Probabilistic model for compact document topic representation
    Berenyi, Zsolt
    Vajk, Istvan
    [J]. PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON SIMULATION, MODELLING AND OPTIMIZATION, 2009, : 322 - +
  • [4] Analysis of Probabilistic model for Document Retrieval in Information Retrieval
    Tamrakar, Astha
    Vishwakarma, Santosh K.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 760 - 765
  • [5] A topic-based document correlation model
    Jia, Xi-Ping
    Peng, Hong
    Zheng, Qj-Lun
    Jiang, Zhuo-Lin
    Li, Zhao
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2487 - 2491
  • [6] The missing link - A probabilistic model of document content and hypertext connectivity
    Cohn, D
    Hofmann, T
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 430 - 436
  • [7] A COMPARISON OF THE COSINE CORRELATION AND THE MODIFIED PROBABILISTIC MODEL
    CROFT, WB
    [J]. INFORMATION TECHNOLOGY-RESEARCH DEVELOPMENT APPLICATIONS, 1984, 3 (02): : 113 - 114
  • [8] Automatic document classification based on probabilistic reasoning: Model and performance analysis
    Lam, W
    Low, KF
    [J]. SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 2719 - 2723
  • [9] A probabilistic information retrieval model by document ranking using term dependencies
    You, Hyun-Jo
    Lee, Jung-Jin
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (05) : 763 - 782
  • [10] A Probabilistic Model for Propagating Document Topic Representation in Distributed Mobile Environments
    Berenyi, Zsolt
    Vajk, Istvan
    [J]. 2009 COMPUTATION WORLD: FUTURE COMPUTING, SERVICE COMPUTATION, COGNITIVE, ADAPTIVE, CONTENT, PATTERNS, 2009, : 476 - 481