News Thread Extraction Based on Topical N-Gram Model with a Background Distribution

被引：0

作者：

Yan, Zehua ^{[1
]}

Li, Fang ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China

来源：

NEURAL INFORMATION PROCESSING, PT II | 2011年 / 7063卷

关键词：

news thread; LDA; N-gram; background distribution;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic thread extraction for news events can help people know different aspects of a news event. In this paper, we present a method of extraction using a topical N-gram model with a background distribution (TNB). Unlike most topic models, such as Latent Dirichlet Allocation (LDA), which relies on the bag-of-words assumption, our model treats words in their textual order. Each news report is represented as a combination of a background distribution over the corpus and a mixture distribution over hidden news threads. Thus our model can model "presidential election" of different years as a background phrase and "Obama wins" as a thread for event "2008 USA presidential election". We apply our method on two different corpora. Evaluation based on human judgment shows that the model can generate meaningful and interpretable threads from a news corpus.

引用

页码：416 / 424

页数：9

共 50 条

[41] Analysis of N-gram model on Telugu Document Classification
Rani, B. Padmaja
Vardhan, B. Vishnu
Durga, A. Kanaka
Reddy, L. Pratap
Babu, A. Vinaya
2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3199 - +
[42] An N-gram based model for predicting of word-formation in Assamese language
Bhuyan, M. P.
Sarma, S. K.
JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (02): : 427 - 440
[43] Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji
Hozan K. Hamarashid
Soran A. Saeed
Tarik A. Rashid
Neural Computing and Applications, 2021, 33 : 4547 - 4566
[44] Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji
Hamarashid, Hozan K.
Saeed, Soran A.
Rashid, Tarik A.
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (09): : 4547 - 4566
[45] Extended N-gram Model for Analysis of Polish Texts
Banasiak, Dariusz
Mierzwa, Jaroslaw
Sterna, Antoni
MAN-MACHINE INTERACTIONS 5, ICMMI 2017, 2018, 659 : 355 - 364
[46] High Order N-gram Model Construction and Application Based on Natural Annotation
Wang, Qibo
Rao, Gaoqi
Xun, Endong
CHINESE LEXICAL SEMANTICS (CLSW 2019), 2020, 11831 : 321 - 328
[47] A variable-length category-based n-gram language model
Niesler, TR
Woodland, PC
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 164 - 167
[48] Short Text Classification Based on Feature Extension Using The N-Gram Model
Zhang, Xinwei
Wu, Bin
2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 710 - 716
[49] Modified Chinese N-gram statistical language model
Tian, Bin
Tian, Hongxin
Yi, Kechu
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2000, 27 (01): : 62 - 64
[50] Managed N-gram Language Model Based on Hadoop Framework and a Hbase Tables
Allam, Tahani Mahmoud
Sallam, Alsayed Abdelhameed
Abdullkader, Hatem M.
2014 9TH INTERNATIONAL CONFERENCE ON INFORMATICS AND SYSTEMS (INFOS), 2014,

← 1 2 3 4 5 →