CLUSTERING AND BOOTSTRAPPING BASED FRAMEWORK FOR NEWS KNOWLEDGE BASE COMPLETION

被引：0

作者：

Srinivasa, K. ^{[1
]}

Thilagam, P. Santhi ^{[1
]}

机构：

[1] Natl Inst Technol Karnataka, Dept Comp Sci & Engn, NH 66, Mangalore 575025, India

来源：

COMPUTING AND INFORMATICS | 2021年 / 40卷 / 02期

关键词：

Knowledge base completion; natural language processing; information extraction; 1002triples; bootstrap; cluster; INFORMATION EXTRACTION; CONSTRUCTION;

D O I：

10.31577/cai_2021_2_318

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Extracting the facts, namely entities and relations, from unstructured sources is an essential step in any knowledge base construction. At the same time, it is also necessary to ensure the completeness of the knowledge base by incremen-tally extracting the new facts from various sources. To date, the knowledge base completion is studied as a problem of knowledge refinement where the missing facts are inferred by reasoning about the information already present in the knowledge base. However, facts missed while extracting the information from multilingual sources are ignored. Hence, this work proposed a generic framework for know-ledge base completion to enrich a knowledge base of crime-related facts extracted from online news articles in the English language, with the facts extracted from low resourced Indian language Hindi news articles. Using the framework, informa-tion from any low-resourced language news articles can be extracted without using language-specific tools like POS tags and using an appropriate machine translation tool. To achieve this, a clustering algorithm is proposed, which explores the redun-dancy among the bilingual collection of news articles by representing the clusters with knowledge base facts unlike the existing Bag of Words representation. From each cluster, the facts extracted from English language articles are bootstrapped to extract the facts from comparable Hindi language articles. This way of boot-strapping within the cluster helps to identify the sentences from a low-resourced language that are enriched with new information related to the facts extracted from a high-resourced language like English. The empirical result shows that the proposed clustering algorithm produced more accurate and high-quality clusters for monolingual and cross-lingual facts, respectively. Experiments also proved that the proposed framework achieves a high recall rate in extracting the new facts from Hindi news articles.

引用

页码：318 / 340

页数：23

共 50 条

[31] Knowledge base clustering for KBS maintenance
Lee, O
Gray, P
[J]. JOURNAL OF SOFTWARE MAINTENANCE-RESEARCH AND PRACTICE, 1998, 10 (06): : 395 - 414
[32] Joint framework for tensor decomposition-based temporal knowledge graph completion
Zhang, Fu
Chen, Hongzhi
Shi, Yuzhe
Cheng, Jingwei
Lin, Jinghao
[J]. INFORMATION SCIENCES, 2024, 654
[33] TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Completion
Wu, Jiapeng
Xu, Yishi
Zhang, Yingxue
Ma, Chen
Coates, Mark
Cheung, Jackie Chi Kit
[J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 428 - 437
[34] Usability Issues in Description Logic Knowledge Base Completion
Baader, Franz
Sertkaya, Baris
[J]. FORMAL CONCEPT ANALYSIS: 7TH INTERNATIONAL CONFERENCE, ICFCA 2009, 2009, 5548 : 1 - 21
[35] Relation Extraction for Knowledge Base Completion: A Supervised Approach
Cerezo-Costas, Hector
Martin-Vicente, Manuela
[J]. SEMANTIC WEB CHALLENGES, SEMWEBEVAL 2018, 2018, 927 : 52 - 66
[36] Modeling of complex internal logic for knowledge base completion
Hongbin Wang
Shengchen Jiang
Zhengtao Yu
[J]. Applied Intelligence, 2020, 50 : 3336 - 3349
[37] Modeling of complex internal logic for knowledge base completion
Wang, Hongbin
Jiang, Shengchen
Yu, Zhengtao
[J]. APPLIED INTELLIGENCE, 2020, 50 (10) : 3336 - 3349
[38] Feature-Rich Networks for Knowledge Base Completion
Komninos, Alexandros
Manandhar, Suresh
[J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 324 - 329
[39] Embedding Multimodal Relational Data for Knowledge Base Completion
Pezeshkpour, Pouya
Chen, Liyan
Singh, Sameer
[J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3208 - 3218
[40] Knowledge Base Completion Using Distinct Subgraph Paths
Mohamed, Sameh K.
Novacek, Vit
Vandenbussche, Pierre-Yves
[J]. 33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 1992 - 1999

← 1 2 3 4 5 →