Prioritized Named Entity Driven LDA for Document Clustering

被引:1
|
作者
Kumar, Durgesh [1 ]
Singh, Sanasam Ranbir [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Comp Sci & Engn, Gauhati, India
关键词
Topic modeling; LDA; Entity-driven topics; PNE-LDA;
D O I
10.1007/978-3-030-34872-4_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic modeling methods like LSI, pLSI, and LDA have been widely studied in text mining domain for various applications like document representation, document clustering/classification, information retrieval, etc. However, such unsupervised methods are effective over corpus with well separable topics. In real-world applications, topics might be of highly overlapping in nature. For example, a news corpus of different terror attacks has highly overlapping keywords across reporting of different terror events. In this paper, we propose a variant of LDA, named as Prioritized Named Entity driven LDA (PNE-LDA), which can address the issue of overlapping topics by prioritizing named entities related to the topics. From various experimental setups, it is observed that the proposed method outperforms its counterparts in entity driven overlapping topics.
引用
收藏
页码:294 / 301
页数:8
相关论文
共 50 条
  • [41] Theoretical Linguistics Rivals Embeddings in Language Clustering for Multilingual Named Entity Recognition
    Imai, Sakura
    Kawahara, Daisuke
    Orita, Naho
    Oda, Hiromune
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 139 - 151
  • [42] CluEval: A Python']Python tool for evaluating clustering performance in named entity disambiguation
    Kim, Jinseok
    Kim, Jenna
    SOFTWARE IMPACTS, 2023, 16
  • [43] A Perspective on Text Classification, Clustering, and Named-entity Recognition in Social Media
    Jahanbin, Kia
    Rahmanian, Fereshte
    Rahmanian, Vahid
    Shakeri, Masihollah
    Shakeri, Heshmatollah
    Rahmaniani, Zhila
    Jahromi, Abdolreza Sotoodeh
    AMBIENT SCIENCE, 2019, 6 (01) : 1 - 4
  • [44] Leveraging Global and Local Topic Popularities for LDA-Based Document Clustering
    Yang, Peng
    Yao, Yu
    Zhou, Huajian
    IEEE ACCESS, 2020, 8 (08): : 24734 - 24745
  • [45] Topic-driven Clustering for Document Datasets
    Zhao, Ying
    Karypis, George
    PROCEEDINGS OF THE FIFTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2005, : 358 - 369
  • [46] An improved ant algorithm with LDA-based representation for text document clustering
    Onan, Aytug
    Bulut, Hasan
    Korukoglu, Serdar
    JOURNAL OF INFORMATION SCIENCE, 2017, 43 (02) : 275 - 292
  • [47] A Concept Driven Document Clustering Using WordNet
    Kolhe, Sujata R.
    Sawarkar, S. D.
    2017 INTERNATIONAL CONFERENCE ON NASCENT TECHNOLOGIES IN ENGINEERING (ICNTE-2017), 2017,
  • [48] A Clustering-Oriented Method for Open-Domain Named Entity Recognition
    Li, Jiahui
    Zhou, Diange
    Duan, Yilin
    Li, Xinchuan
    Yao, Hong
    2024 IEEE 24TH INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID 2024, 2024, : 189 - 195
  • [49] Semi-supervised Document Clustering Based on Latent Dirichlet Allocation (LDA)
    秦永彬
    李解
    黄瑞章
    李晶
    JournalofDonghuaUniversity(EnglishEdition), 2016, 33 (05) : 685 - 688
  • [50] Measuring relevance with named entity based patterns in topic-focused document summarization
    Wei, Furu
    Li, Wenjie
    He, Yanxiang
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 111 - +