News Thread Extraction Based on Topical N-Gram Model with a Background Distribution

被引：0

作者：

Yan, Zehua ^{[1
]}

Li, Fang ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China

来源：

NEURAL INFORMATION PROCESSING, PT II | 2011年 / 7063卷

关键词：

news thread; LDA; N-gram; background distribution;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic thread extraction for news events can help people know different aspects of a news event. In this paper, we present a method of extraction using a topical N-gram model with a background distribution (TNB). Unlike most topic models, such as Latent Dirichlet Allocation (LDA), which relies on the bag-of-words assumption, our model treats words in their textual order. Each news report is represented as a combination of a background distribution over the corpus and a mixture distribution over hidden news threads. Thus our model can model "presidential election" of different years as a background phrase and "Obama wins" as a thread for event "2008 USA presidential election". We apply our method on two different corpora. Evaluation based on human judgment shows that the model can generate meaningful and interpretable threads from a news corpus.

引用

页码：416 / 424

页数：9

共 50 条

[1] An N-Gram Based Method for Bengali Keyphrase Extraction
Sarkar, Kamal
INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 36 - 41
[2] Advanced Information Extraction with n-gram based LSI
Guven, Ahmet
Bozkurt, O. Ozgur
Kalipsiz, Oya
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 17, 2006, 17 : 13 - 18
[3] Differentially Private n-gram Extraction
Kim, Kunho
Gopi, Sivakanth
Kulkarni, Janardhan
Yekhanin, Sergey
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Regularized Subspace n-Gram Model for Phonotactic iVector Extraction
Soufifar, Mehdi
Burget, Lukas
Plchot, Oldrich
Cumani, Sandro
Cernocky, Jan
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 74 - 78
[5] An Empirical Model for n-gram Frequency Distribution in Large Corpora
Silva, Joaquim F.
Cunha, Jose C.
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 840 - 851
[6] Content Based Fake News Detection Using N-Gram Models
Wynne, Hnin Ei
Wint, Zar Zar
IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 669 - 673
[7] A Theoretical Model for n-gram Distribution in Big Data Corpora
Silva, Joaquim F.
Goncalves, Carlos
Cunha, Jose C.
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 134 - 141
[8] Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR
Zhou, Zhengyu
Meng, Helen
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4933 - 4936
[9] DERIN: A data extraction information and n-gram
Lopes Figueiredo, Leandro Neiva
de Assis, Guilherme Tavares
Ferreira, Anderson A.
INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (05) : 1120 - 1138
[10] Pseudo-Conventional N-Gram Representation of the Discriminative N-Gram Model for LVCSR
Zhou, Zhengyu
Meng, Helen
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 943 - 952

← 1 2 3 4 5 →