News Thread Extraction Based on Topical N-Gram Model with a Background Distribution

被引:0
|
作者
Yan, Zehua [1 ]
Li, Fang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
关键词
news thread; LDA; N-gram; background distribution;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic thread extraction for news events can help people know different aspects of a news event. In this paper, we present a method of extraction using a topical N-gram model with a background distribution (TNB). Unlike most topic models, such as Latent Dirichlet Allocation (LDA), which relies on the bag-of-words assumption, our model treats words in their textual order. Each news report is represented as a combination of a background distribution over the corpus and a mixture distribution over hidden news threads. Thus our model can model "presidential election" of different years as a background phrase and "Obama wins" as a thread for event "2008 USA presidential election". We apply our method on two different corpora. Evaluation based on human judgment shows that the model can generate meaningful and interpretable threads from a news corpus.
引用
收藏
页码:416 / 424
页数:9
相关论文
共 50 条
  • [1] An N-Gram Based Method for Bengali Keyphrase Extraction
    Sarkar, Kamal
    INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 36 - 41
  • [2] Advanced Information Extraction with n-gram based LSI
    Guven, Ahmet
    Bozkurt, O. Ozgur
    Kalipsiz, Oya
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 17, 2006, 17 : 13 - 18
  • [3] Differentially Private n-gram Extraction
    Kim, Kunho
    Gopi, Sivakanth
    Kulkarni, Janardhan
    Yekhanin, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Regularized Subspace n-Gram Model for Phonotactic iVector Extraction
    Soufifar, Mehdi
    Burget, Lukas
    Plchot, Oldrich
    Cumani, Sandro
    Cernocky, Jan
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 74 - 78
  • [5] An Empirical Model for n-gram Frequency Distribution in Large Corpora
    Silva, Joaquim F.
    Cunha, Jose C.
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 840 - 851
  • [6] Content Based Fake News Detection Using N-Gram Models
    Wynne, Hnin Ei
    Wint, Zar Zar
    IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 669 - 673
  • [7] A Theoretical Model for n-gram Distribution in Big Data Corpora
    Silva, Joaquim F.
    Goncalves, Carlos
    Cunha, Jose C.
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 134 - 141
  • [8] Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR
    Zhou, Zhengyu
    Meng, Helen
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4933 - 4936
  • [9] DERIN: A data extraction information and n-gram
    Lopes Figueiredo, Leandro Neiva
    de Assis, Guilherme Tavares
    Ferreira, Anderson A.
    INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (05) : 1120 - 1138
  • [10] Pseudo-Conventional N-Gram Representation of the Discriminative N-Gram Model for LVCSR
    Zhou, Zhengyu
    Meng, Helen
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 943 - 952