A Hybrid Model for Documents Representation

被引:1
|
作者
Mohamed, Dina [1 ]
El-Kilany, Ayman [1 ]
Mokhtar, Hoda M. O. [1 ]
机构
[1] Cairo Univ, Fac Comp & Artificial Intelligence, Giza, Egypt
关键词
Document representation; latent dirichlet allocation; hierarchical latent dirichlet allocation; Word2vec; Isolation Forest;
D O I
10.14569/IJACSA.2021.0120339
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Text representation is a critical issue for exploring the insights behind the text. Many models have been developed to represent the text in defined forms such as numeric vectors where it would be easy to calculate the similarity between the documents using the well-known distance measures. In this paper, we aim to build a model to represent text semantically either in one document or multiple documents using a combination of hierarchical Latent Dirichlet Allocation (hLDA), Word2vec, and Isolation Forest models. The proposed model aims to learn a vector for each document using the relationship between its words' vectors and the hierarchy of topics generated using the hierarchical Latent Dirichlet Allocation model. Then, the isolation forest model is used to represent multiple documents in one representation as one profile to facilitate finding similar documents to the profile. The proposed text representation model outperforms the traditional text representation models when applied to represent scientific papers before performing content-based scientific papers recommendation for researchers.
引用
收藏
页码:317 / 324
页数:8
相关论文
共 50 条
  • [31] The Bag-of-Repeats Representation of Documents
    Galle, Matthias
    SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, 2013, : 1053 - 1056
  • [32] A typeful and tagless representation for XML documents
    Zhu, DP
    Xi, HW
    PROGRAMMING LANGUAGES AND SYSTEMS, PROCEEDINGS, 2003, 2895 : 89 - 104
  • [33] Research on the Formal Representation of ATML Documents
    Fan, Shuyi
    Jiang, Huixia
    Wei, Baohua
    Liu, Wanming
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2018, 423 : 959 - 967
  • [34] Discourse Representation Parsing for Sentences and Documents
    Liu, Jiangming
    Cohen, Shay B.
    Lapata, Mirella
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6248 - 6262
  • [35] Data and Documents: Methods of Information Representation
    Khodorovskii, L. A.
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2014, 41 (01) : 47 - 56
  • [36] Indiscriminateness in Representation Spaces of Terms and Documents
    Claveau, Vincent
    ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 251 - 262
  • [37] A Hybrid of Deep Sentence Representation and Local Feature Representation Model for Question Answer Selection
    Tang, Dongge
    Rong, Wenge
    Shi, Libin
    Yang, Haodong
    Xiong, Zhang
    2018 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC 2018), 2018, : 280 - 283
  • [38] From Sentences to Documents: Extending Abstract Meaning Representation for Understanding Documents
    Moreda, Paloma
    Suarez, Armando
    Lloret, Elena
    Saquete, Estela
    Moreno, Isabel
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2018, (60): : 61 - 68
  • [39] Criminal Action Graph: A semantic representation model of judgement documents for legal charge prediction
    Feng, Geya
    Qin, Yongbin
    Huang, Ruizhang
    Chen, Yanping
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (05)
  • [40] Hybrid high dimensional model representation (HHDMR) on the partitioned data
    Tunga, MA
    Demiralp, M
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2006, 185 (01) : 107 - 132