Prioritized Named Entity Driven LDA for Document Clustering

被引:1
|
作者
Kumar, Durgesh [1 ]
Singh, Sanasam Ranbir [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Comp Sci & Engn, Gauhati, India
关键词
Topic modeling; LDA; Entity-driven topics; PNE-LDA;
D O I
10.1007/978-3-030-34872-4_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic modeling methods like LSI, pLSI, and LDA have been widely studied in text mining domain for various applications like document representation, document clustering/classification, information retrieval, etc. However, such unsupervised methods are effective over corpus with well separable topics. In real-world applications, topics might be of highly overlapping in nature. For example, a news corpus of different terror attacks has highly overlapping keywords across reporting of different terror events. In this paper, we propose a variant of LDA, named as Prioritized Named Entity driven LDA (PNE-LDA), which can address the issue of overlapping topics by prioritizing named entities related to the topics. From various experimental setups, it is observed that the proposed method outperforms its counterparts in entity driven overlapping topics.
引用
收藏
页码:294 / 301
页数:8
相关论文
共 50 条
  • [31] Hybrid medical named entity recognition using document structure and surrounding context
    Landolsi, Mohamed Yassine
    Romdhane, Lotfi Ben
    Hlaoua, Lobna
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (04): : 5011 - 5041
  • [32] Exploiting global contextual information for document-level named entity recognition
    Yu, Yiting
    Wang, Zanbo
    Wei, Wei
    Zhang, Ruihan
    Mao, Xian-Ling
    Feng, Shanshan
    Wang, Fei
    He, Zhiyong
    Jiang, Sheng
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [33] Document-Level Named Entity Recognition by Incorporating Global and Neighbor Features
    Hu, Anwen
    Dou, Zhicheng
    Wen, Ji-rong
    INFORMATION RETRIEVAL (CCIR 2019), 2019, 11772 : 79 - 91
  • [34] Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities
    Montalvo, Soto
    Martinez, Raquel
    Casillas, Arantza
    Fresno, Victor
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 1145 - 1152
  • [35] Combining data-driven systems for improving named entity recognition
    Kozareva, Z
    Ferrández, O
    Montoyo, A
    Muñoz, R
    Suárez, A
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2005, 3513 : 80 - 90
  • [36] Data and knowledge-driven named entity recognition for cyber security
    Gao, Chen
    Zhang, Xuan
    Liu, Hui
    CYBERSECURITY, 2021, 4 (01)
  • [37] Review of Data-Driven Approaches to Chinese Named Entity Recognition
    Xiao, Lei
    Chen, Zhenjia
    Computer Engineering and Applications, 2024, 60 (16) : 34 - 48
  • [38] Combining data-driven systems for improving named entity recognition
    Kozareva, Z.
    Ferrandez, O.
    Montoyo, A.
    Munoz, R.
    Suarez, A.
    Gomez, J.
    DATA & KNOWLEDGE ENGINEERING, 2007, 61 (03) : 449 - 466
  • [39] Data and knowledge-driven named entity recognition for cyber security
    Chen Gao
    Xuan Zhang
    Hui Liu
    Cybersecurity, 4
  • [40] A Template-Driven Framework for Chinese Medical Named Entity Recognition
    Song, Yilin
    Kong, Fang
    Ji, Shengjie
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 398 - 409