News Thread Extraction Based on Topical N-Gram Model with a Background Distribution

被引:0
|
作者
Yan, Zehua [1 ]
Li, Fang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
关键词
news thread; LDA; N-gram; background distribution;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic thread extraction for news events can help people know different aspects of a news event. In this paper, we present a method of extraction using a topical N-gram model with a background distribution (TNB). Unlike most topic models, such as Latent Dirichlet Allocation (LDA), which relies on the bag-of-words assumption, our model treats words in their textual order. Each news report is represented as a combination of a background distribution over the corpus and a mixture distribution over hidden news threads. Thus our model can model "presidential election" of different years as a background phrase and "Obama wins" as a thread for event "2008 USA presidential election". We apply our method on two different corpora. Evaluation based on human judgment shows that the model can generate meaningful and interpretable threads from a news corpus.
引用
收藏
页码:416 / 424
页数:9
相关论文
共 50 条
  • [21] LANGUAGE IDENTIFICATION BASED ON N-GRAM FEATURE EXTRACTION METHOD BY USING CLASSIFIERS
    Bayrak Hayta, Sengul
    Takci, Hidayet
    Eminli, Mubariz
    ISTANBUL UNIVERSITY-JOURNAL OF ELECTRICAL AND ELECTRONICS ENGINEERING, 2013, 13 (02): : 1629 - 1638
  • [22] Chinese keyword extraction based on N-gram and word co-occurrence
    Jiao, Hui
    Liu, Qian
    Jia, Hui-bo
    CIS WORKSHOPS 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY WORKSHOPS, 2007, : 152 - +
  • [23] A New Estimate of the n-gram Language Model
    Aouragh, Si Lhoussain
    Yousfi, Abdellah
    Laaroussi, Saida
    Gueddah, Hicham
    Nejja, Mohammed
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 211 - 215
  • [24] Development of the N-gram Model for Azerbaijani Language
    Bannayeva, Aliya
    Aslanov, Mustafa
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [25] Multiclass composite N-gram language model based on connection direction
    Yamamoto, Hirofumi
    Sagisaka, Yoshinori
    Systems and Computers in Japan, 2003, 34 (07) : 108 - 114
  • [26] SPANISH LINGUISTIC STEGANOGRAPHY BASED ON N-GRAM MODEL AND ZIPF LAW
    Munoz Munoz, Alfonso
    Argueelles Alvarez, Irina
    ARBOR-CIENCIA PENSAMIENTO Y CULTURA, 2014, 190 (768)
  • [27] Automatic Composition System based on Genetic Algorithm and N-gram Model
    Tomari, Manabu
    Sato, Masayuki
    Osana, Yuko
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 202 - +
  • [28] Topic-Dependent-Class-Based n-Gram Language Model
    Naptali, Welly
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1513 - 1525
  • [29] Code Artificiality: A Metric for the Code Stealth Based on an N-gram Model
    Kanzaki, Yuichiro
    Monden, Akito
    Collberg, Christian
    2015 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON SOFTWARE PROTECTION (SPRO), 2015, : 31 - 37
  • [30] Character-Based N-gram Model for Uyghur Text Retrieval
    Tohti, Turdi
    Xu, Lirui
    Huang, Jimmy
    Musajan, Winira
    Hamdulla, Askar
    BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 : 678 - 688