Story Forest: Extracting Events and Telling Stories from Breaking News

被引:33
|
作者
Liu, Bang [1 ,2 ]
Han, Fred X. [1 ,2 ]
Niu, Di [3 ]
Kong, Linglong [4 ]
Lai, Kunfeng [5 ]
Xu, Yu [5 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] 4359 Annett Common SW, Edmonton, AB T6W 2V6, Canada
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
[4] Univ Alberta, Dept Math & Stat Sci, Edmonton, AB, Canada
[5] Tencent, Platform & Content Business Grp, 10000 Shennan Ave, Shenzhen 518057, Guangdong, Peoples R China
关键词
Story forest; EventX; document clustering; news articles organization; community detection;
D O I
10.1145/3377939
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Extracting events accurately from vast news corpora and organize events logically is critical for news apps and search engines, which aim to organize news information collected from the Internet and present it to users in the most sensible forms. Intuitively speaking, an event is a group of news documents that report the same news incident possibly in different ways. In this article, we describe our experience of implementing a news content organization system at Tencent to discover events from vast streams of breaking news and to evolve news story structures in an online fashion. Our real-world system faces unique challenges in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we (1) need to accurately and quickly extract distinguishable events from massive streams of long text documents, and (2) must develop the structures of event stories in an online manner, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest, a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. A core novelty of our Story Forest system is EventX, a semi-supervised scheme to extract events from massive Internet news corpora. EventX relies on a two-layered, graph-based clustering procedure to group documents into fine-grained events. We conducted extensive evaluations based on (1) 60 GB of real-world Chinese news data, (2) a large Chinese Internet news dataset that contains 11,748 news articles with truth event labels, and (3) the 20 News Groups English dataset, through detailed pilot user experience studies. The results demonstrate the superior capabilities of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Growing Story Forest Online from Massive Breaking News
    Liu, Bang
    Niu, Di
    Lai, Kunfeng
    Kong, Linglong
    Xu, Yu
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 777 - 785
  • [2] WRITING NEWS AND TELLING STORIES
    DARNTON, R
    DAEDALUS, 1975, 104 (02) : 175 - 194
  • [3] Extracting news events from microblogs
    Repp, Oystein
    Ramampiaro, Heri
    JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS, 2018, 21 (04): : 695 - 723
  • [4] Extracting and Clustering of Story Events from a Story Corpus
    Yu, Hye-Yeon
    Cheong, Yun-Gyung
    Bae, Byung-Chull
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (10) : 3498 - 3512
  • [5] FOR THE STORY TELLER. STORY TELLING AND STORIES TO TELL
    不详
    EDUCATION, 1914, 34 (06): : 398 - 398
  • [6] Telling better stories: strengthening the story in story and simulation
    Kemp-Benedict, Eric
    ENVIRONMENTAL RESEARCH LETTERS, 2012, 7 (04):
  • [7] HiEve: A Corpus for Extracting Event Hierarchies from News Stories
    Glavas, Goran
    Snajder, Jan
    Kordjamshidi, Parisa
    Moens, Marie-Francine
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3678 - 3683
  • [8] The Story Is True: The Art and Meaning of Telling Stories
    Bennett, Gillian
    JOURNAL OF AMERICAN FOLKLORE, 2011, 124 (494) : 339 - 340
  • [9] The Story Is True: The Art and Meaning of Telling Stories
    Fry, Don
    VIRGINIA QUARTERLY REVIEW, 2007, 83 (03) : 262 - 262
  • [10] The Story Is True: The Art and Meaning of Telling Stories
    Warner, Robbin Zeff
    WESTERN FOLKLORE, 2009, 68 (04) : 513 - 514