Probabilistic document correlation model

被引:2
|
作者
Jia, Xiping [1 ]
Peng, Hong [1 ]
机构
[1] S China Univ Technol, Sch Engn & Comp Sci, Guangzhou 510640, Peoples R China
关键词
D O I
10.1109/CIS.Workshops.2007.65
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vector Space Model (VSM) and related models are popular in document relationship analysis in text mining recently. However, they are failed to discover the document correlation from topic level. This paper proposes a probabilistic document correlation model (PDC) to capture the document correlation based on topics. The PDC model defines the document correlation by the posterior probability of documents. And the posterior probability of each document is resolved through introducing the posterior probability of topics and topic similarity. Latent Dirichlet Allocation (LDA), a generative topic model, is used for topic retrieval in this paper. Experiments on correlated document search show that the PDC model outperforms the VSM in average retrieval precision and document compressing.
引用
收藏
页码:433 / 436
页数:4
相关论文
共 50 条
  • [31] Probabilistic document length priors for language models
    Blanco, Roi
    Barreiro, Alvaro
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 394 - 405
  • [32] A probabilistic relational approach for web document clustering
    Fersini, E.
    Messina, E.
    Archetti, F.
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (02) : 117 - 130
  • [33] Probabilistic data fusion on a large document collection
    Lillis, David
    Toolan, Fergus
    Collier, Rem
    Dunnion, John
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2006, 26 (1-2) : 23 - 34
  • [34] Probabilistic data fusion on a large document collection
    David Lillis
    Fergus Toolan
    Rem Collier
    John Dunnion
    [J]. Artificial Intelligence Review, 2006, 26 : 23 - 34
  • [35] PROBABILISTIC AND GENETIC ALGORITHMS FOR DOCUMENT-RETRIEVAL
    GORDON, M
    [J]. COMMUNICATIONS OF THE ACM, 1988, 31 (10) : 1208 - 1218
  • [36] Probabilistic investigation of sensitivities of advanced test-analysis model correlation methods
    Bergman, Elizabeth J.
    Allen, Matthew S.
    Kammer, Daniel C.
    Mayes, Randall L.
    [J]. JOURNAL OF SOUND AND VIBRATION, 2010, 329 (13) : 2516 - 2531
  • [37] CORRELATION OF PROBABILISTIC BACKLASH WITH MEASUREMENTS
    MICHALEC, GW
    [J]. MECHANISM AND MACHINE THEORY, 1973, 8 (02) : 161 - 173
  • [38] Computationally efficient approximation of a probabilistic model for document representation in the WEBSOM full-text analysis method
    Kaski, S
    [J]. NEURAL PROCESSING LETTERS, 1997, 5 (02) : 139 - 151
  • [39] Computationally Efficient Approximation of a Probabilistic Model for Document Representation in the WEBSOM Full-Text Analysis Method
    S. Kaski
    [J]. Neural Processing Letters, 1997, 5 (2) : 69 - 81
  • [40] Search relevant retrieval on indonesian translation hadith document using query expansion and smoothing probabilistic model
    Ponilan, Ika Rahayu
    Adiwijaya
    Bijaksana, Moch Arif
    Raharusun, Agus Suyadi
    [J]. 2ND INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE, 2019, 1192