Reducing explicit semantic representation vectors using Latent Dirichlet Allocation

被引:9
|
作者
Saif, Abdulgabbar [1 ]
Ab Aziz, Mohd Juzaiddin [1 ]
Omar, Nazlia [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Ctr Artificial Intelligence Technol, Bangi 43600, Selangor, Malaysia
关键词
Semantic representation; Explicit Semantic Analysis; Topic modeling; Knowledge-based method; RELATEDNESS; SIMILARITY;
D O I
10.1016/j.knosys.2016.03.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Explicit Semantic Analysis (ESA) is a knowledge-based method which builds the semantic representation of the words depending on the textual description of the concepts in the certain knowledge source. Due to its simplicity and success, ESA has received wide attention from researchers in the computational linguistics and information retrieval. However, the representation vectors formed by ESA method are generally very excessive, high dimensional, and may contain many redundant concepts. In this paper, we introduce a reduced semantic representation method that constructs the semantic interpretation of the words as the vectors over the latent topics from the original ESA representation vectors. For modeling the latent topics, the Latent Dirichlet Allocation (LDA) is adapted to the ESA vectors for extracting the topics as the probability distributions over the concepts rather than the words in the traditional model. The proposed method is applied to the wide knowledge sources used in the computational semantic analysis: WordNet and Wikipedia. For evaluation, we use the proposed method in two natural language processing tasks: measuring the semantic relatedness between words/texts and text clustering. The experimental results indicate that the proposed method overcomes the limitations of the representation of the ESA method. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:145 / 159
页数:15
相关论文
共 50 条
  • [1] Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation
    Lienou, Marie
    Maitre, Henri
    Datcu, Mihai
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2010, 7 (01) : 28 - 32
  • [2] Topic Modeling Twitter Data Using Latent Dirichlet Allocation and Latent Semantic Analysis
    Qomariyah, Siti
    Iriawan, Nur
    Fithriasari, Kartika
    [J]. 2ND INTERNATIONAL CONFERENCE ON SCIENCE, MATHEMATICS, ENVIRONMENT, AND EDUCATION, 2019, 2019, 2194
  • [3] Evaluation of text semantic features using latent dirichlet allocation model
    Zhou, Chunjie
    Li, Nao
    Zhang, Chi
    Yang, Xiaoyu
    [J]. International Journal of Performability Engineering, 2020, 16 (06) : 968 - 978
  • [4] Discovery of Semantic Relationships in PolSAR Images Using Latent Dirichlet Allocation
    Tanase, Radu
    Bahmanyar, Reza
    Schwarz, Gottfried
    Datcu, Mihai
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (02) : 237 - 241
  • [5] Text Representation Using Multi-level Latent Dirichlet Allocation
    Razavi, Amir H.
    Inkpen, Diana
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2014, 2014, 8436 : 215 - 226
  • [6] Inference Algorithms in Latent Dirichlet Allocation for Semantic Classification
    Zubir, Wan Mohammad Aflah Mohammad
    Aziz, Izzatdin Abdul
    Jaafar, Jafreezal
    Hasan, Mohd Hilmi
    [J]. APPLIED COMPUTATIONAL INTELLIGENCE AND MATHEMATICAL METHODS: COMPUTATIONAL METHODS IN SYSTEMS AND SOFTWARE 2017, VOL. 2, 2018, 662 : 173 - 184
  • [7] Semantic latent dirichlet allocation for automatic topic extraction
    Bhutada, Sunil
    Balaram, V. V. S. S. S.
    Bulusu, Vishnu Vardhan
    [J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2016, 37 (03): : 449 - 469
  • [8] A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement
    Wheeler, Jordan M.
    Cohen, Allan S.
    Wang, Shiyu
    [J]. JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2024, 49 (05) : 848 - 874
  • [9] Accuracy of Unit Under Test Identification Using Latent Semantic Analysis and Latent Dirichlet Allocation
    Madeja, Matej
    Poruban, Jaroslav
    [J]. 2019 IEEE 15TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATICS (INFORMATICS 2019), 2019, : 161 - 166
  • [10] News Topics Categorization Using Latent Dirichlet Allocation and Sparse Representation Classifier
    Lee, Yuan-Shan
    Lo, Rocky
    Chen, Chia-Yen
    Lin, Po-Chuan
    Wang, Jia-Ching
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2015, : 136 - 137