DRDLC: Discovering Relevant Documents Using Latent Dirichlet Allocation and Cosine Similarity

被引:4
|
作者
Ramya, R. S. [1 ]
Singh, Ganesh T. [1 ]
Sejal, D. [1 ]
Venugopal, K. R. [1 ]
Iyengar, S. S. [2 ]
Patnaik, L. M. [3 ]
机构
[1] Univ Visvesvaraya Coll Engn, Bengaluru, Karnataka, India
[2] Florida Int Univ, Miami, FL 33199 USA
[3] Indian Inst Sci, Bengaluru, Karnataka, India
关键词
Pattern mining; query expansion; query search; text mining;
D O I
10.1145/3301326.3301342
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the availability of digital documents over web is increased drastically and there is a need for effective methods to retrieve and organize the digital documents. Since data is dispersed globally and is unorganized, it is a challenging task to develop an effective methods that can generate high quality features in these documents. It is necessary to reduce the gap between users search intention and the retrieved results known as semantic gap. In this paper, Discovering Relevant Documents using Latent Dirichlet Allocation and Cosine Similarity (DRDLC) is proposed. Word similarity is computed using CS Cosine Similarity present in search results documents. LDA is applied on extracted patterns and documents. Hashing is used to extract high relevant documents efficiently. Further, term synonyms are identified using word net and the documents are re-ranked. Experiments using the model Relevance Feature Discovery (RFD) on Reuters Corpus Volume-1 (RCV-1) show that the proposed DRDLC framework results in improved performance by providing more relevant documents to the user input query.
引用
收藏
页码:87 / 91
页数:5
相关论文
共 50 条
  • [1] Evaluation of Stability and Similarity of Latent Dirichlet Allocation
    Tang, Jun
    Huo, Ruilong
    Yao, Jiali
    [J]. 2013 FOURTH WORLD CONGRESS ON SOFTWARE ENGINEERING (WCSE), 2013, : 78 - 83
  • [2] Extractive summarization of Malayalam documents using latent Dirichlet allocation: An experience
    Kondath, Manju
    Suseelan, David Peter
    Idicula, Sumam Mary
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2022, 31 (01) : 393 - 406
  • [3] Discovering Latent Topics by Gaussian Latent Dirichlet Allocation and Spectral Clustering
    Yuan, Bo
    Gao, Xinbo
    Niu, Zhenxing
    Tian, Qi
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [4] Modeling and Discovering Occupancy Patterns in Sensor Networks Using Latent Dirichlet Allocation
    Castanedo, Federico
    Aghajan, Hamid
    Kleihorst, Richard
    [J]. FOUNDATIONS ON NATURAL AND ARTIFICIAL COMPUTATION: 4TH INTERNATIONAL WORK-CONFERENCE ON THE INTERPLAY BETWEEN NATURAL AND ARTIFICIAL COMPUTATION, IWINAC 2011, PART I, 2011, 6686 : 481 - 490
  • [5] Discovering research topics from library electronic references using latent Dirichlet allocation
    Fang, Debin
    Yang, Haixia
    Gao, Baojun
    Li, Xiaojun
    [J]. LIBRARY HI TECH, 2018, 36 (03) : 400 - 410
  • [6] Discovering Traceability between Business Process and Software Component using Latent Dirichlet Allocation
    Baskara, Andreyan Rizky
    Sarno, Riyanarto
    Solichah, Adhatus
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTING (ICIC), 2016, : 251 - 256
  • [7] Diabetic complication prediction using a similarity-enhanced latent Dirichlet allocation model
    Ding, Shuai
    Li, Zhenmin
    Liu, Xiao
    Huang, Hui
    Yang, Shanlin
    [J]. INFORMATION SCIENCES, 2019, 499 : 12 - 24
  • [8] Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference
    Matsutani, Taro
    Ueno, Yuki
    Fukunaga, Tsukasa
    Hamada, Michiaki
    [J]. BIOINFORMATICS, 2019, 35 (22) : 4543 - 4552
  • [9] Latent Dirichlet Allocation in Discovering Goals in Patients Undergoing Bladder Cancer Surgery
    Li, Yuelin
    Atkinson, Thomas M.
    [J]. 2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 540 - 546
  • [10] Semantic similarity measure for topic modeling using latent Dirichlet allocation and collapsed Gibbs sampling
    Micheal Olalekan Ajinaja
    Adebayo Olusola Adetunmbi
    Chukwuemeka Christian Ugwu
    Olugbemiga Solomon Popoola
    [J]. Iran Journal of Computer Science, 2023, 6 (1) : 81 - 94