A Hybrid Model for Documents Representation

被引:1
|
作者
Mohamed, Dina [1 ]
El-Kilany, Ayman [1 ]
Mokhtar, Hoda M. O. [1 ]
机构
[1] Cairo Univ, Fac Comp & Artificial Intelligence, Giza, Egypt
关键词
Document representation; latent dirichlet allocation; hierarchical latent dirichlet allocation; Word2vec; Isolation Forest;
D O I
10.14569/IJACSA.2021.0120339
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Text representation is a critical issue for exploring the insights behind the text. Many models have been developed to represent the text in defined forms such as numeric vectors where it would be easy to calculate the similarity between the documents using the well-known distance measures. In this paper, we aim to build a model to represent text semantically either in one document or multiple documents using a combination of hierarchical Latent Dirichlet Allocation (hLDA), Word2vec, and Isolation Forest models. The proposed model aims to learn a vector for each document using the relationship between its words' vectors and the hierarchy of topics generated using the hierarchical Latent Dirichlet Allocation model. Then, the isolation forest model is used to represent multiple documents in one representation as one profile to facilitate finding similar documents to the profile. The proposed text representation model outperforms the traditional text representation models when applied to represent scientific papers before performing content-based scientific papers recommendation for researchers.
引用
收藏
页码:317 / 324
页数:8
相关论文
共 50 条
  • [1] Hybrid model for knowledge representation
    Shetty, Reena T. N.
    Riccio, Pierre-Michel
    Quinqueton, Joel
    2006 INTERNATIONAL CONFERENCE ON HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2006, : 355 - +
  • [2] Grove Data Model for Efficient Representation of XML Documents
    Anwar, Yasmin
    Kamel, Amr
    Ahmed, Aziza Saad
    WOCN: 2009 IFIP INTERNATIONAL CONFERENCE ON WIRELESS AND OPTICAL COMMUNICATIONS NETWORKS, 2009, : 99 - +
  • [3] Deep neural annealing model for the semantic representation of documents
    de Mendonca, Leandro R. C.
    da Cruz Junior, Gelson
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96
  • [4] Filtering documents with a hybrid neural network model
    Bologna, Guido
    Boretti, Mathieu
    Albuquerque, Paul
    BIO-INSPIRED MODELING OF COGNITIVE TASKS, PT 1, PROCEEDINGS, 2007, 4527 : 261 - +
  • [5] A Hubel Weisel model for hierarchical representation of concepts in textual documents
    Ramanathan, Kiruthika
    Shi Luping
    Chong, Chong Tow
    COGNITION IN FLUX, 2010, : 1106 - 1111
  • [6] POI Representation Learning by a Hybrid Model
    Li, Yurui
    Chen, Hongmei
    Wang, Lizhen
    Xiao, Qing
    2019 20TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2019), 2019, : 485 - 490
  • [7] A Hybrid Representation Model for Service Contracts
    Jaramillo, Gloria Elena
    Ardagna, Claudio A.
    Anisetti, Marco
    2015 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY RESEARCH (ICTRC), 2015, : 246 - 249
  • [8] Using Hybrid Methods and 'Core Documents' for the Representation of Clusters and Topics: The Astronomy Dataset
    Glanzel, Wolfgang
    Thijs, Bart
    PROCEEDINGS OF ISSI 2015 ISTANBUL: 15TH INTERNATIONAL SOCIETY OF SCIENTOMETRICS AND INFORMETRICS CONFERENCE, 2015, : 1085 - 1090
  • [9] Using hybrid methods and 'core documents' for the representation of clusters and topics: the astronomy dataset
    Glanzel, Wolfgang
    Thijs, Bart
    SCIENTOMETRICS, 2017, 111 (02) : 1071 - 1087
  • [10] Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset
    Wolfgang Glänzel
    Bart Thijs
    Scientometrics, 2017, 111 : 1071 - 1087