A Hybrid Model for Documents Representation

被引:1
|
作者
Mohamed, Dina [1 ]
El-Kilany, Ayman [1 ]
Mokhtar, Hoda M. O. [1 ]
机构
[1] Cairo Univ, Fac Comp & Artificial Intelligence, Giza, Egypt
关键词
Document representation; latent dirichlet allocation; hierarchical latent dirichlet allocation; Word2vec; Isolation Forest;
D O I
10.14569/IJACSA.2021.0120339
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Text representation is a critical issue for exploring the insights behind the text. Many models have been developed to represent the text in defined forms such as numeric vectors where it would be easy to calculate the similarity between the documents using the well-known distance measures. In this paper, we aim to build a model to represent text semantically either in one document or multiple documents using a combination of hierarchical Latent Dirichlet Allocation (hLDA), Word2vec, and Isolation Forest models. The proposed model aims to learn a vector for each document using the relationship between its words' vectors and the hierarchy of topics generated using the hierarchical Latent Dirichlet Allocation model. Then, the isolation forest model is used to represent multiple documents in one representation as one profile to facilitate finding similar documents to the profile. The proposed text representation model outperforms the traditional text representation models when applied to represent scientific papers before performing content-based scientific papers recommendation for researchers.
引用
收藏
页码:317 / 324
页数:8
相关论文
共 50 条
  • [41] Haze Removal Using a Hybrid Convolutional Sparse Representation Model
    Cai, Ye
    Luo, Lan
    Gao, Hongxia
    Niu, Shicheng
    Yang, Weipeng
    Qi, Tian
    Liang, Guoheng
    Proceedings of SPIE - The International Society for Optical Engineering, 2022, 12342
  • [42] Hybrid high dimensional model representation for failure probability estimation
    Chowdhury, Rajib
    Rao, B.N.
    Meher Prasad, A.
    Journal of Structural Engineering (Madras), 2010, 37 (03): : 185 - 196
  • [43] HYBRID KNOWLEDGE REPRESENTATION FOR THE DOMAIN MODEL OF INTELLIGENT FLIGHT TRAINER
    Geng Xiaobing
    Qin Shuxin
    Chang Hongxing
    Yang Yiping
    2011 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, 2011, : 29 - 33
  • [44] Translation protocol for hybrid documents
    不详
    JOURNAL OF HEADACHE AND PAIN, 2007, 8 : S45 - S47
  • [45] Growing triples on trees: an XML-RDF hybrid model for annotated documents
    François Goasdoué
    Konstantinos Karanasos
    Yannis Katsis
    Julien Leblay
    Ioana Manolescu
    Stamatis Zampetakis
    The VLDB Journal, 2013, 22 : 589 - 613
  • [46] Growing triples on trees: an XML-RDF hybrid model for annotated documents
    Goasdoue, Francois
    Karanasos, Konstantinos
    Katsis, Yannis
    Leblay, Julien
    Manolescu, Ioana
    Zampetakis, Stamatis
    VLDB JOURNAL, 2013, 22 (05): : 589 - 613
  • [47] Text mining: identification of similarity of text documents using hybrid similarity model
    K. M. Shiva Prasad
    Iran Journal of Computer Science, 2023, 6 (2) : 123 - 135
  • [48] DSRIM: A Deep Neural Information Retrieval Model Enhanced by a Knowledge Resource Driven Representation of Documents
    Gia-Hung Nguyen
    Soulier, Laure
    Tamine, Lynda
    Bricon-Souf, Nathalie
    ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 19 - 26
  • [49] Representation techniques of texts for unsupervised classification of documents
    Cobo, German
    Sevillano, Xavier
    Alias, Francesc
    Claudi Socoro, Joan
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2006, (37): : 329 - 336
  • [50] A hierarchical representation of form documents for identification and retrieval
    Duygulu, P
    Atalay, V
    DOCUMENT RECOGNITION AND RETRIEVAL VII, 2000, 3967 : 128 - 139