HDPauthor: A New Hybrid Author-Topic Model using Latent Dirichlet Allocation and Hierarchical Dirichlet Processes

被引:1
|
作者
Yang, Ming [1 ]
Hsu, Willian H. [1 ]
机构
[1] Kansas State Univ, Comp & Informat Sci, Manhattan, KS 66506 USA
关键词
Topic Modeling; Hierarchical Dirichlet Process;
D O I
10.1145/2872518.2890561
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a new approach towards capturing topic interests corresponding to all the observed latent topics generated by an author in documents to which he or she has contributed. Topic models based on Latent Dirichlet Allocation (LDA) have been built for this purpose but are brittle as to the number of topics allowed for a collection and for each author of documents within the collection. Meanwhile, topic models based upon Hierarchical Dirichlet Processes (HDPs) allow an arbitrary number of topics to be discovered and generative distributions of interest inferred from text corpora, but this approach is not directly extensible to generative models of authors as contributors to documents with variable topical expertise. Our approach combines an existing HDP framework for learning topics from free text with latent authorship learning within a generative model using author list information. This model adds another layer into the current hierarchy of HDPs to represent topic groups shared by authors, and the document topic distribution is represented as a mixture of topic distribution of its authors. Our model automatically learns author contribution partitions for documents in addition to topics.
引用
收藏
页码:619 / 624
页数:6
相关论文
共 50 条
  • [1] A Hybrid Latent Dirichlet Allocation Approach for Topic Classification
    Hsu, Chi-I
    Chiu, Chaochang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 312 - 315
  • [2] Author Identification Using Latent Dirichlet Allocation
    Calvo, Hiram
    Hernandez-Castaneda, Angel
    Garcia-Flores, Jorge
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 303 - 312
  • [3] Topic Selection in Latent Dirichlet Allocation
    Wang, Biao
    Liu, Zelong
    Li, Maozhen
    Liu, Yang
    Qi, Man
    [J]. 2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
  • [4] Topic Modeling Using Latent Dirichlet allocation: A Survey
    Chauhan, Uttam
    Shah, Apurva
    [J]. ACM COMPUTING SURVEYS, 2021, 54 (07)
  • [5] Using Latent Dirichlet Allocation for Topic Modelling in Twitter
    Ostrowski, David Alfred
    [J]. 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2015, : 493 - 497
  • [6] Topic Model Allocation of Conversational Dialogue Records by Latent Dirichlet Allocation
    Yeh, Jui-Feng
    Lee, Chen-Hsien
    Tan, Yi-Shiuan
    Yu, Liang-Chih
    [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [7] An Improved Latent Dirichlet Allocation Model for Hot Topic Extraction
    Liu, Guolong
    Xu, Xiaofei
    Zhu, Ying
    Li, Li
    [J]. 2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 470 - 476
  • [8] Research Topic Analysis in Engineering Management Using a Latent Dirichlet Allocation Model
    Kim, Jin Ho
    Chen, Weiru
    [J]. JOURNAL OF INDUSTRIAL INTEGRATION AND MANAGEMENT-INNOVATION AND ENTREPRENEURSHIP, 2018, 3 (04):
  • [9] Topic modeling for expert finding using latent Dirichlet allocation
    Momtazi, Saeedeh
    Naumann, Felix
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (05) : 346 - 353
  • [10] Formation of Hierarchical Object Concept Using Hierarchical Latent Dirichlet Allocation
    Ando, Yoshiki
    Nakamura, Tomoaki
    Araki, Takaya
    Nagai, Takayuki
    [J]. 2013 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2013, : 2272 - 2279