Prioritized Named Entity Driven LDA for Document Clustering

被引:1
|
作者
Kumar, Durgesh [1 ]
Singh, Sanasam Ranbir [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Comp Sci & Engn, Gauhati, India
关键词
Topic modeling; LDA; Entity-driven topics; PNE-LDA;
D O I
10.1007/978-3-030-34872-4_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic modeling methods like LSI, pLSI, and LDA have been widely studied in text mining domain for various applications like document representation, document clustering/classification, information retrieval, etc. However, such unsupervised methods are effective over corpus with well separable topics. In real-world applications, topics might be of highly overlapping in nature. For example, a news corpus of different terror attacks has highly overlapping keywords across reporting of different terror events. In this paper, we propose a variant of LDA, named as Prioritized Named Entity driven LDA (PNE-LDA), which can address the issue of overlapping topics by prioritizing named entities related to the topics. From various experimental setups, it is observed that the proposed method outperforms its counterparts in entity driven overlapping topics.
引用
收藏
页码:294 / 301
页数:8
相关论文
共 50 条
  • [1] Fuzzy Named Entity-Based Document Clustering
    Cao, Tru H.
    Do, Hai T.
    Hong, Dung T.
    Quan, Tho T.
    2008 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2008, : 2030 - 2036
  • [2] Automatic Text Summarization using Document Clustering Named Entity Recognition
    Selvan, R. . Senthamizh
    Arutchelvan, K.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 537 - 543
  • [3] LDA in Character-LSTM-CRF Named Entity Recognition
    Konopik, Miloslav
    Prazak, Ondrej
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 58 - 66
  • [4] Named Entity Linking on Handwritten Document Images
    Tueselmann, Oliver
    Fink, Gernot A.
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 199 - 213
  • [5] Suffix Tree Clustering with Named Entity Recognition
    Zhang, Jiwei
    Dang, Qiuyue
    Lu, Yueming
    Sun, Songlin
    2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA), 2013, : 549 - 556
  • [6] LDA Based Feature Selection for Document Clustering
    Kumar, B. Shravan
    Ravi, Vadlamani
    COMPUTE'17: PROCEEDINGS OF THE 10TH ANNUAL ACM INDIA COMPUTE CONFERENCE, 2017, : 125 - 130
  • [7] Language Clustering for Multilingual Named Entity Recognition
    Shaffer, Kyle
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 40 - 45
  • [8] Named Entity Recognition from Unstructured Handwritten Document Images
    Adak, Chandranath
    Chaudhuri, Bidyut B.
    Blumenstein, Michael
    PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 375 - 380
  • [9] A deep learning method for named entity recognition in bidding document
    Ji, Yunfei
    Tong, Chao
    Liang, Jun
    Yang, Xi
    Zhao, Zheng
    Wang, Xu
    2018 INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY, 2019, 1168
  • [10] Document Theme Extraction Using Named-Entity Recognition
    Nagrale, Deepali
    Khatavkar, Vaibhav
    Kulkarni, Parag
    COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 499 - 509