News Thread Extraction Based on Topical N-Gram Model with a Background Distribution

被引:0
|
作者
Yan, Zehua [1 ]
Li, Fang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
关键词
news thread; LDA; N-gram; background distribution;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic thread extraction for news events can help people know different aspects of a news event. In this paper, we present a method of extraction using a topical N-gram model with a background distribution (TNB). Unlike most topic models, such as Latent Dirichlet Allocation (LDA), which relies on the bag-of-words assumption, our model treats words in their textual order. Each news report is represented as a combination of a background distribution over the corpus and a mixture distribution over hidden news threads. Thus our model can model "presidential election" of different years as a background phrase and "Obama wins" as a thread for event "2008 USA presidential election". We apply our method on two different corpora. Evaluation based on human judgment shows that the model can generate meaningful and interpretable threads from a news corpus.
引用
收藏
页码:416 / 424
页数:9
相关论文
共 50 条
  • [31] Turkish Meaningful Text Generation with Class Based N-Gram Model
    Kutlugun, Mehmet Ali
    Sirin, Yahya
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [32] A Novel Interpolated N-gram Language Model Based on Class Hierarchy
    Lv, Zhenyu
    Liu, Wenju
    Yang, Zhanlei
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 473 - 477
  • [33] Teraman: A tool for n-gram extraction from large datasets
    Ceska, Zdenek
    Hanak, Ivo
    Tesar, Roman
    ICCP 2007: IEEE 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING, PROCEEDINGS, 2007, : 209 - +
  • [34] Comparison of Distributed Computing Approaches to Complexity of n-gram Extraction
    Aubakirov, Sanzhar
    Trigo, Paulo
    Ahmed-Zaki, Darhan
    DATA: PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA MANAGEMENT TECHNOLOGIES AND APPLICATIONS, 2016, : 25 - 30
  • [35] Research of Affective Recognize Based on N-gram
    Xue Weimin
    Lin Benjing
    Yu Bing
    2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 702 - +
  • [36] A variant of n-gram based language classification
    Tomovic, Andrija
    Janicic, Predrag
    AI(ASTERISK)IA 2007: ARTIFICIAL INTELLIGENCE AND HUMAN-ORIENTED COMPUTING, 2007, 4733 : 410 - +
  • [37] A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction
    Kumar, Niraj
    Srinathan, Kannan
    Varma, Vasudeva
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2016, 8 (02) : 124 - 143
  • [38] N-gram Density based Malware Detection
    O'Kane, Philip
    Sezer, Sakir
    McLaughlin, Kieran
    2014 WORLD SYMPOSIUM ON COMPUTER APPLICATIONS & RESEARCH (WSCAR), 2014,
  • [39] A language independent n-gram model for word segmentation
    Kang, Seung-Shik
    Hwang, Kyu-Baek
    Lect. Notes Comput. Sci., 1600, (557-565):
  • [40] A language independent n-gram model for word segmentation
    Kang, Seung-Shik
    Hwang, Kyu-Baek
    AI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4304 : 557 - +