Temporal knowledge extraction from large-scale text corpus

被引:8
|
作者
Liu, Yu [1 ]
Hua, Wen [1 ]
Zhou, Xiaofang [1 ]
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
关键词
Temporal knowledge harvesting; Temporal patterns; Temporal facts; Knowledge base; BASE;
D O I
10.1007/s11280-020-00836-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Knowledge, in practice, is time-variant and many relations are only valid for a certain period of time. This phenomenon highlights the importance of harvesting temporal-aware knowledge, i.e., the relational facts coupled with their valid temporal interval. Inspired by pattern-based information extraction systems, we resort to temporal patterns to extract time-aware knowledge from free text. However, pattern design is extremely laborious and time consuming even for a single relation, and free text is usually ambiguous which makes temporal instance extraction extremely difficult. Therefore, in this work, we study the problem of temporal knowledge extraction with two steps: (1) temporal pattern extraction by automatically analysing a large-scale text corpus with a small number of seed temporal facts, (2) temporal instance extraction by applying the identified temporal patterns. For pattern extraction, we introduce various techniques, including corpus annotation, pattern generation, scoring and clustering, to improve both accuracy and coverage of the extracted patterns. For instance extraction, we propose a double-check strategy to improve the accuracy and a set of node-extension rules to improve the coverage. We conduct extensive experiments on real world datasets and compared with state-of-the-art systems. Experimental results verify the effectiveness of our proposed methods for temporal knowledge harvesting.
引用
收藏
页码:135 / 156
页数:22
相关论文
共 50 条
  • [31] Large-scale knowledge acquisition from botanical texts
    Role, Francois
    Gavilanes, Milagros Fernandez
    de la Clergerie, Eric Villemonte
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4592 : 395 - 400
  • [32] Vocal development in a large-scale crosslinguistic corpus
    Cychosz, Margaret
    Cristia, Alejandrina
    Bergelson, Elika
    Casillas, Marisa
    Baudet, Gladys
    Warlaumont, Anne S.
    Scaff, Camila
    Yankowitz, Lisa
    Seidl, Amanda
    [J]. DEVELOPMENTAL SCIENCE, 2021, 24 (05)
  • [33] Extracting large-scale knowledge bases from the web
    Kumar, R
    Raghavan, P
    Rajagopalan, S
    Tomkins, A
    [J]. PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999, : 639 - 650
  • [34] A Phrase Topic Model for Large-scale Corpus
    Li, Baoji
    Xu, Wenhua
    Tian, Yuhui
    Chen, Juan
    [J]. 2019 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2019, : 634 - 639
  • [35] A Large-Scale Query Spelling Correction Corpus
    Hagen, Matthias
    Potthast, Martin
    Gohsen, Marcel
    Rathgeber, Anja
    Stein, Benno
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1261 - 1264
  • [36] Creating a Large-Scale Silver Corpus from Multiple Algorithmic Segmentations
    Krenn, Markus
    Dorfer, Matthias
    del Toro, Oscar Alfonso Jimenez
    Mueller, Henning
    Menze, Bjoern
    Weber, Marc-Andre
    Hanbury, Allan
    Langs, Georg
    [J]. MEDICAL COMPUTER VISION: ALGORITHMS FOR BIG DATA, 2016, 9601 : 103 - 115
  • [37] Extracting answers to natural language questions from large-scale corpus
    Li, P
    Wang, XL
    Guan, Y
    Zhao, YM
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 690 - 694
  • [38] BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events
    Gerner, Martin
    Sarafraz, Farzaneh
    Bergman, Casey M.
    Nenadic, Goran
    [J]. BIOINFORMATICS, 2012, 28 (16) : 2154 - 2161
  • [39] Privacy Now or Never: Large-Scale Extraction and Analysis of Dates in Privacy Policy Text
    Srinath, Mukund
    Matheson, Lee
    Venkit, Pranav Narayanan
    Zanfir-Fortuna, Gabriela
    Schaub, Florian
    Giles, C. Lee
    Wilson, Shomir
    [J]. PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2023, 2023,
  • [40] Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research
    Àlex Bravo
    Janet Piñero
    Núria Queralt-Rosinach
    Michael Rautschka
    Laura I Furlong
    [J]. BMC Bioinformatics, 16