CLUSTERING AND BOOTSTRAPPING BASED FRAMEWORK FOR NEWS KNOWLEDGE BASE COMPLETION

被引:0
|
作者
Srinivasa, K. [1 ]
Thilagam, P. Santhi [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Comp Sci & Engn, NH 66, Mangalore 575025, India
关键词
Knowledge base completion; natural language processing; information extraction; 1002triples; bootstrap; cluster; INFORMATION EXTRACTION; CONSTRUCTION;
D O I
10.31577/cai_2021_2_318
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extracting the facts, namely entities and relations, from unstructured sources is an essential step in any knowledge base construction. At the same time, it is also necessary to ensure the completeness of the knowledge base by incremen-tally extracting the new facts from various sources. To date, the knowledge base completion is studied as a problem of knowledge refinement where the missing facts are inferred by reasoning about the information already present in the knowledge base. However, facts missed while extracting the information from multilingual sources are ignored. Hence, this work proposed a generic framework for know-ledge base completion to enrich a knowledge base of crime-related facts extracted from online news articles in the English language, with the facts extracted from low resourced Indian language Hindi news articles. Using the framework, informa-tion from any low-resourced language news articles can be extracted without using language-specific tools like POS tags and using an appropriate machine translation tool. To achieve this, a clustering algorithm is proposed, which explores the redun-dancy among the bilingual collection of news articles by representing the clusters with knowledge base facts unlike the existing Bag of Words representation. From each cluster, the facts extracted from English language articles are bootstrapped to extract the facts from comparable Hindi language articles. This way of boot-strapping within the cluster helps to identify the sentences from a low-resourced language that are enriched with new information related to the facts extracted from a high-resourced language like English. The empirical result shows that the proposed clustering algorithm produced more accurate and high-quality clusters for monolingual and cross-lingual facts, respectively. Experiments also proved that the proposed framework achieves a high recall rate in extracting the new facts from Hindi news articles.
引用
收藏
页码:318 / 340
页数:23
相关论文
共 50 条
  • [1] Bootstrapping an Online News Knowledge Base
    Hoxha, Klesti
    Baxhaku, Artur
    Ninka, Ilia
    [J]. WEB ENGINEERING (ICWE 2016), 2016, 9671 : 501 - 506
  • [2] A Novel Framework for Authority Management Based on Knowledge Base Completion of the Graph Neural Network
    Wang, Jianmin
    Xia, Yukun
    Zhao, Wenbin
    Zhang, Yuhang
    Wu, Feng
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [3] Instance-based Learning for Knowledge Base Completion
    Cui, Wanyun
    Chen, Xingran
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [4] Knowledge Base Completion Based on Multimodal Representation Learning
    Wang, Jingbin
    Su, Hua
    Lai, Xiaolian
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2021, 34 (01): : 33 - 43
  • [5] Commonsense Knowledge Base Completion
    Li, Xiang
    Taheri, Aynaz
    Tu, Lifu
    Gimpel, Kevin
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1445 - 1455
  • [6] Knowledge Base Generation Based on Fuzzy Clustering
    T. A. Moiseeva
    T. M. Ledeneva
    [J]. Programming and Computer Software, 2023, 49 : S99 - S107
  • [7] Knowledge Base Generation Based on Fuzzy Clustering
    Moiseeva, T. A.
    Ledeneva, T. M.
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 2023, 49 (SUPPL 2) : S99 - S107
  • [8] Bootstrapping Inference in the IDP Knowledge Base System
    Bogaerts, Bart
    Jansen, Joachim
    de Cat, Broes
    Janssens, Gerda
    Bruynooghe, Maurice
    Denecker, Marc
    [J]. NEW GENERATION COMPUTING, 2016, 34 (03) : 193 - 220
  • [9] Bootstrapping Inference in the IDP Knowledge Base System
    Bart Bogaerts
    Joachim Jansen
    Broes de Cat
    Gerda Janssens
    Maurice Bruynooghe
    Marc Denecker
    [J]. New Generation Computing, 2016, 34 : 193 - 220
  • [10] An Approach to Knowledge Base Completion by a Committee-Based Knowledge Graph Embedding
    Choi, Su Jeong
    Song, Hyun-Je
    Park, Seong-Bae
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (08):