Multi-View Clustering for Open Knowledge Base Canonicalization

被引:6
|
作者
Shen, Wei [1 ]
Yang, Yang [1 ]
Liu, Yinan [1 ]
机构
[1] Nankai Univ, Coll Comp Sci, TMCC, TKLNDST, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Open Knowledge Base Canonicalization; Multi-View Clustering; Training Data Optimization; VALIDITY INDEX; NUMBER;
D O I
10.1145/3534678.3539449
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open information extraction (OIE) methods extract plenty of OIE triples <noun phrase, relation phrase, noun phrase> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. However, these two views of knowledge have so far been leveraged in isolation by existing works. In this paper, we propose CMVC, a novel unsupervised framework that leverages these two views of knowledge jointly for canonicalizing OKBs without the need of manually annotated labels. To achieve this goal, we propose a multi-view CH K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering their different clustering qualities. In order to further enhance the canonicalization performance, we propose a training data optimization strategy in terms of data quantity and data quality respectively in each particular view to refine the learned view-specific embeddings in an iterative manner. Additionally, we propose a Log-Jump algorithm to predict the optimal number of clusters in a data-driven way without requiring any labels. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.
引用
收藏
页码:1578 / 1588
页数:11
相关论文
共 50 条
  • [1] CMVC plus : A Multi-View Clustering Framework for Open Knowledge Base Canonicalization Via Contrastive Learning
    Yang, Yang
    Shen, Wei
    Shu, Junfeng
    Liu, Yinan
    Curry, Edward
    Li, Guoliang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2296 - 2310
  • [2] Open knowledge base canonicalization with multi-task learning
    Liu, Bingchen
    Peng, Huang
    Zeng, Weixin
    Zhao, Xiang
    Liu, Shijun
    Pan, Li
    Li, Xin
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (05):
  • [3] Multi-level feature interaction for open knowledge base canonicalization
    Sui, Xuhui
    Zhang, Ying
    Song, Kehui
    Zhou, Baohang
    Yuan, Xiaojie
    KNOWLEDGE-BASED SYSTEMS, 2024, 303
  • [4] Towards Practical Open Knowledge Base Canonicalization
    Wu, Tien-Hsuan
    Wu, Zhiyong
    Kao, Ben
    Yin, Pengcheng
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 883 - 892
  • [5] Joint Open Knowledge Base Canonicalization and Linking
    Liu, Yinan
    Shen, Wei
    Wang, Yuanfei
    Wang, Jianyong
    Yang, Zhenglu
    Yuan, Xiaojie
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2253 - 2261
  • [6] Multi-view clustering
    Bickel, S
    Scheffer, T
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 19 - 26
  • [7] Knowledge Graph Embedding Based on Multi-View Clustering Framework
    Xiao, Han
    Chen, Yidong
    Shi, Xiaodong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (02) : 585 - 596
  • [8] Multi-view Clustering Ensembles
    Xie, Xijiong
    Sun, Shiliang
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 51 - 56
  • [9] Multi-View Multiple Clustering
    Yao, Shixin
    Yu, Guoxian
    Wang, Jun
    Domeniconi, Carlotta
    Zhang, Xiangliang
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4121 - 4127
  • [10] Multi-view multitask learning for knowledge base relation detection
    Zhang, Hongzhi
    Xu, Guangluan
    Liang, Xiao
    Zhang, Weili
    Sun, Xian
    Huang, Tinglei
    KNOWLEDGE-BASED SYSTEMS, 2019, 183