CLUSTERING AND BOOTSTRAPPING BASED FRAMEWORK FOR NEWS KNOWLEDGE BASE COMPLETION

被引:0
|
作者
Srinivasa, K. [1 ]
Thilagam, P. Santhi [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Comp Sci & Engn, NH 66, Mangalore 575025, India
关键词
Knowledge base completion; natural language processing; information extraction; 1002triples; bootstrap; cluster; INFORMATION EXTRACTION; CONSTRUCTION;
D O I
10.31577/cai_2021_2_318
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extracting the facts, namely entities and relations, from unstructured sources is an essential step in any knowledge base construction. At the same time, it is also necessary to ensure the completeness of the knowledge base by incremen-tally extracting the new facts from various sources. To date, the knowledge base completion is studied as a problem of knowledge refinement where the missing facts are inferred by reasoning about the information already present in the knowledge base. However, facts missed while extracting the information from multilingual sources are ignored. Hence, this work proposed a generic framework for know-ledge base completion to enrich a knowledge base of crime-related facts extracted from online news articles in the English language, with the facts extracted from low resourced Indian language Hindi news articles. Using the framework, informa-tion from any low-resourced language news articles can be extracted without using language-specific tools like POS tags and using an appropriate machine translation tool. To achieve this, a clustering algorithm is proposed, which explores the redun-dancy among the bilingual collection of news articles by representing the clusters with knowledge base facts unlike the existing Bag of Words representation. From each cluster, the facts extracted from English language articles are bootstrapped to extract the facts from comparable Hindi language articles. This way of boot-strapping within the cluster helps to identify the sentences from a low-resourced language that are enriched with new information related to the facts extracted from a high-resourced language like English. The empirical result shows that the proposed clustering algorithm produced more accurate and high-quality clusters for monolingual and cross-lingual facts, respectively. Experiments also proved that the proposed framework achieves a high recall rate in extracting the new facts from Hindi news articles.
引用
收藏
页码:318 / 340
页数:23
相关论文
共 50 条
  • [41] Commonsense Knowledge Base Completion with Structural and Semantic Context
    Malaviya, Chaitanya
    Bhagavatula, Chandra
    Bosselut, Antoine
    Choi, Yejin
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 2925 - 2933
  • [42] Compositional Vector Space Models for Knowledge Base Completion
    Neelakantan, Arvind
    Roth, Benjamin
    McCallum, Andrew
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 156 - 166
  • [43] Knowledge Base Completion via Coupled Path Ranking
    Wang, Quan
    Liu, Jing
    Luo, Yuanfei
    Wang, Bin
    Lin, Chin-Yew
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1308 - 1318
  • [44] Integrated Embedding Approach for Knowledge Base Completion with CNN
    Chen, Samuel
    Xie, Shengyi
    Chen, Qingqiang
    [J]. INFORMATION TECHNOLOGY AND CONTROL, 2020, 49 (04): : 622 - 642
  • [45] End-to-end Case-Based Reasoning for Commonsense Knowledge Base Completion
    Yang, Zonglin
    Du, Xinya
    Cambria, Erik
    Cardie, Claire
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3509 - 3522
  • [46] Message Passing Clustering (MPC): a knowledge-based framework for clustering under biological constraints
    Geng, Huimin
    Deng, Xutao
    Ali, Hesham H.
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2008, 2 (02) : 95 - 120
  • [47] Knowledge Graph Embedding Based on Multi-View Clustering Framework
    Xiao, Han
    Chen, Yidong
    Shi, Xiaodong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (02) : 585 - 596
  • [48] HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph Completion
    Xu, Yunhui
    Li, Youru
    Xu, Muhao
    Zhu, Zhenfeng
    Zhao, Yao
    [J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20 (08)
  • [49] A Re-ranking Model for Accurate Knowledge Base Completion with Knowledge Base Schema and Web Statistic
    Choi, Su Jeong
    Song, Hyun-Je
    Yoon, Hee-Geun
    Park, Seong-Bae
    Park, Se-Young
    [J]. 2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 4958 - 4964
  • [50] A Re-Ranking Framework for Knowledge Graph Completion
    Wang, Zikang
    Li, Linjing
    Zeng, Daniel Dajun
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,