CLUSTERING AND BOOTSTRAPPING BASED FRAMEWORK FOR NEWS KNOWLEDGE BASE COMPLETION

被引:0
|
作者
Srinivasa, K. [1 ]
Thilagam, P. Santhi [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Comp Sci & Engn, NH 66, Mangalore 575025, India
关键词
Knowledge base completion; natural language processing; information extraction; 1002triples; bootstrap; cluster; INFORMATION EXTRACTION; CONSTRUCTION;
D O I
10.31577/cai_2021_2_318
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extracting the facts, namely entities and relations, from unstructured sources is an essential step in any knowledge base construction. At the same time, it is also necessary to ensure the completeness of the knowledge base by incremen-tally extracting the new facts from various sources. To date, the knowledge base completion is studied as a problem of knowledge refinement where the missing facts are inferred by reasoning about the information already present in the knowledge base. However, facts missed while extracting the information from multilingual sources are ignored. Hence, this work proposed a generic framework for know-ledge base completion to enrich a knowledge base of crime-related facts extracted from online news articles in the English language, with the facts extracted from low resourced Indian language Hindi news articles. Using the framework, informa-tion from any low-resourced language news articles can be extracted without using language-specific tools like POS tags and using an appropriate machine translation tool. To achieve this, a clustering algorithm is proposed, which explores the redun-dancy among the bilingual collection of news articles by representing the clusters with knowledge base facts unlike the existing Bag of Words representation. From each cluster, the facts extracted from English language articles are bootstrapped to extract the facts from comparable Hindi language articles. This way of boot-strapping within the cluster helps to identify the sentences from a low-resourced language that are enriched with new information related to the facts extracted from a high-resourced language like English. The empirical result shows that the proposed clustering algorithm produced more accurate and high-quality clusters for monolingual and cross-lingual facts, respectively. Experiments also proved that the proposed framework achieves a high recall rate in extracting the new facts from Hindi news articles.
引用
收藏
页码:318 / 340
页数:23
相关论文
共 50 条
  • [21] Knowledge Base Completion Using Matrix Factorization
    He, Wenqiang
    Feng, Yansong
    Zou, Lei
    Zhao, Dongyan
    WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 256 - 267
  • [22] A Hybrid Framework for News Clustering Based on the DBSCAN-Martingale and LDA
    Gialampoukidis, Ilias
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION (MLDM 2016), 2016, 9729 : 170 - 184
  • [23] Knowledge Base Completion by Learning to Rank Model
    Huang, Yong
    Wang, Zhichun
    KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: LANGUAGE, KNOWLEDGE, AND INTELLIGENCE, CCKS 2017, 2017, 784 : 1 - 6
  • [24] On Evaluating Embedding Models for Knowledge Base Completion
    Wang, Yanjie
    Ruffinelli, Daniel
    Gemulla, Rainer
    Broscheit, Samuel
    Meilicke, Christian
    4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 104 - 112
  • [25] Modeling Paths for Explainable Knowledge Base Completion
    Stadelmaier, Josua
    Pado, Sebastian
    BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, : 147 - 157
  • [26] Canonical Tensor Decomposition for Knowledge Base Completion
    Lacroix, Timothee
    Usunier, Nicolas
    Obozinski, Guillaume
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [27] Structure Embedding for Knowledge Base Completion and Analytics
    Zhou, Zili
    Xu, Guandong
    Zhu, Wenhao
    Li, Jinyan
    Zhang, Wu
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 737 - 743
  • [28] Knowledge Base Completion Using Embeddings and Rules
    Wang, Quan
    Wang, Bin
    Guo, Li
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1859 - 1865
  • [29] KGBoost: A classification-based knowledge base completion method with negative sampling
    Wang, Yun-Cheng
    Ge, Xiou
    Wang, Bin
    Kuo, C-C Jay
    PATTERN RECOGNITION LETTERS, 2022, 157 : 104 - 111
  • [30] Ontology based Spatial Clustering Framework for Implicit Knowledge Discovery
    Bhattacharjee, Shrutilipi
    Dwivedi, Akash
    Prasad, Rendhir R.
    Ghosh, Soumya K.
    2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 561 - 566