News Thread Extraction Based on Topical N-Gram Model with a Background Distribution

被引:0
|
作者
Yan, Zehua [1 ]
Li, Fang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
关键词
news thread; LDA; N-gram; background distribution;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic thread extraction for news events can help people know different aspects of a news event. In this paper, we present a method of extraction using a topical N-gram model with a background distribution (TNB). Unlike most topic models, such as Latent Dirichlet Allocation (LDA), which relies on the bag-of-words assumption, our model treats words in their textual order. Each news report is represented as a combination of a background distribution over the corpus and a mixture distribution over hidden news threads. Thus our model can model "presidential election" of different years as a background phrase and "Obama wins" as a thread for event "2008 USA presidential election". We apply our method on two different corpora. Evaluation based on human judgment shows that the model can generate meaningful and interpretable threads from a news corpus.
引用
收藏
页码:416 / 424
页数:9
相关论文
共 50 条
  • [41] Analysis of N-gram model on Telugu Document Classification
    Rani, B. Padmaja
    Vardhan, B. Vishnu
    Durga, A. Kanaka
    Reddy, L. Pratap
    Babu, A. Vinaya
    2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3199 - +
  • [42] An N-gram based model for predicting of word-formation in Assamese language
    Bhuyan, M. P.
    Sarma, S. K.
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (02): : 427 - 440
  • [43] Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji
    Hozan K. Hamarashid
    Soran A. Saeed
    Tarik A. Rashid
    Neural Computing and Applications, 2021, 33 : 4547 - 4566
  • [44] Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji
    Hamarashid, Hozan K.
    Saeed, Soran A.
    Rashid, Tarik A.
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (09): : 4547 - 4566
  • [45] Extended N-gram Model for Analysis of Polish Texts
    Banasiak, Dariusz
    Mierzwa, Jaroslaw
    Sterna, Antoni
    MAN-MACHINE INTERACTIONS 5, ICMMI 2017, 2018, 659 : 355 - 364
  • [46] High Order N-gram Model Construction and Application Based on Natural Annotation
    Wang, Qibo
    Rao, Gaoqi
    Xun, Endong
    CHINESE LEXICAL SEMANTICS (CLSW 2019), 2020, 11831 : 321 - 328
  • [47] A variable-length category-based n-gram language model
    Niesler, TR
    Woodland, PC
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 164 - 167
  • [48] Short Text Classification Based on Feature Extension Using The N-Gram Model
    Zhang, Xinwei
    Wu, Bin
    2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 710 - 716
  • [49] Modified Chinese N-gram statistical language model
    Tian, Bin
    Tian, Hongxin
    Yi, Kechu
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2000, 27 (01): : 62 - 64
  • [50] Managed N-gram Language Model Based on Hadoop Framework and a Hbase Tables
    Allam, Tahani Mahmoud
    Sallam, Alsayed Abdelhameed
    Abdullkader, Hatem M.
    2014 9TH INTERNATIONAL CONFERENCE ON INFORMATICS AND SYSTEMS (INFOS), 2014,