Story Forest: Extracting Events and Telling Stories from Breaking News

被引:33
|
作者
Liu, Bang [1 ,2 ]
Han, Fred X. [1 ,2 ]
Niu, Di [3 ]
Kong, Linglong [4 ]
Lai, Kunfeng [5 ]
Xu, Yu [5 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] 4359 Annett Common SW, Edmonton, AB T6W 2V6, Canada
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
[4] Univ Alberta, Dept Math & Stat Sci, Edmonton, AB, Canada
[5] Tencent, Platform & Content Business Grp, 10000 Shennan Ave, Shenzhen 518057, Guangdong, Peoples R China
关键词
Story forest; EventX; document clustering; news articles organization; community detection;
D O I
10.1145/3377939
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Extracting events accurately from vast news corpora and organize events logically is critical for news apps and search engines, which aim to organize news information collected from the Internet and present it to users in the most sensible forms. Intuitively speaking, an event is a group of news documents that report the same news incident possibly in different ways. In this article, we describe our experience of implementing a news content organization system at Tencent to discover events from vast streams of breaking news and to evolve news story structures in an online fashion. Our real-world system faces unique challenges in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we (1) need to accurately and quickly extract distinguishable events from massive streams of long text documents, and (2) must develop the structures of event stories in an online manner, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest, a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. A core novelty of our Story Forest system is EventX, a semi-supervised scheme to extract events from massive Internet news corpora. EventX relies on a two-layered, graph-based clustering procedure to group documents into fine-grained events. We conducted extensive evaluations based on (1) 60 GB of real-world Chinese news data, (2) a large Chinese Internet news dataset that contains 11,748 news articles with truth event labels, and (3) the 20 News Groups English dataset, through detailed pilot user experience studies. The results demonstrate the superior capabilities of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers.
引用
收藏
页数:28
相关论文
共 50 条
  • [31] Automating Complex News Stories by Capturing News Events as Data
    Caswell, David
    Dorr, Konstantin
    JOURNALISM PRACTICE, 2019, 13 (08) : 951 - 955
  • [32] Breaking bad news to cancer patients: transitioning from taboo to truth-telling in Russia
    Vvedenskaya, S.
    Tolmatchyova, O.
    Vvedenskaya, I.
    Grinykova, L.
    EJC SUPPLEMENTS, 2005, 3 (02): : 380 - 380
  • [33] Breaking bad news to cancer patients: transitioning from taboo to truth-telling in Russia
    Vvedenskaya, Elena
    Tolmacheva, Oxana
    Grinykova, Liana
    PSYCHO-ONCOLOGY, 2008, 17 : S234 - S235
  • [34] Telling stories: News media, health literacy and public policy in Canada
    Hayes, Michael
    Ross, Ian E.
    Gasher, Mike
    Gutstein, Donald
    Dunn, James R.
    Hackett, Robert A.
    SOCIAL SCIENCE & MEDICINE, 2007, 64 (09) : 1842 - 1852
  • [35] WORDS ARE SWEET - IGBO STORIES AND STORY-TELLING - UMEASIEGBU,RN
    KILSON, M
    JOURNAL OF RELIGION IN AFRICA, 1983, 14 (02) : 164 - 165
  • [36] More Than Telling a Story: Transforming Data into Visually Shared Stories
    Lee, Bongshin
    Riche, Nathalie Henry
    Isenberg, Petra
    Carpendale, Sheelagh
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2015, 35 (05) : 84 - 90
  • [37] Telling the Columbia story -: Source selection in news accounts of a shuttle accident
    Sumpter, Randall S.
    Garner, Johny T.
    SCIENCE COMMUNICATION, 2007, 28 (04) : 455 - 475
  • [38] Telling a Different Story: A Longitudinal Investigation of News Diversity in Four Countries
    de Vries, Erik
    Vliegenthart, Rens
    Walgrave, Stefaan
    JOURNALISM STUDIES, 2022, 23 (14) : 1721 - 1739
  • [39] How conflict news comes into being: Reconstructing "reality' through telling stories
    Hoxha, Abit
    Hanitzsch, Thomas
    MEDIA WAR AND CONFLICT, 2018, 11 (01): : 46 - 64
  • [40] Extracting actors, actions and events from sports video - A fundamental approach to story tracking
    Nitta, N
    Babaguchi, N
    Kitahashi, T
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 718 - 721