Multi-View Clustering for Open Knowledge Base Canonicalization

被引:6
|
作者
Shen, Wei [1 ]
Yang, Yang [1 ]
Liu, Yinan [1 ]
机构
[1] Nankai Univ, Coll Comp Sci, TMCC, TKLNDST, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Open Knowledge Base Canonicalization; Multi-View Clustering; Training Data Optimization; VALIDITY INDEX; NUMBER;
D O I
10.1145/3534678.3539449
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open information extraction (OIE) methods extract plenty of OIE triples <noun phrase, relation phrase, noun phrase> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. However, these two views of knowledge have so far been leveraged in isolation by existing works. In this paper, we propose CMVC, a novel unsupervised framework that leverages these two views of knowledge jointly for canonicalizing OKBs without the need of manually annotated labels. To achieve this goal, we propose a multi-view CH K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering their different clustering qualities. In order to further enhance the canonicalization performance, we propose a training data optimization strategy in terms of data quantity and data quality respectively in each particular view to refine the learned view-specific embeddings in an iterative manner. Additionally, we propose a Log-Jump algorithm to predict the optimal number of clusters in a data-driven way without requiring any labels. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.
引用
收藏
页码:1578 / 1588
页数:11
相关论文
共 50 条
  • [21] Incomplete Multi-view Clustering
    Gao, Hang
    Peng, Yuxing
    Jian, Songlei
    INTELLIGENT INFORMATION PROCESSING VIII, 2016, 486 : 245 - 255
  • [22] Unsupervised Multi-View Clustering by Squeezing Hybrid Knowledge From Cross View and Each View
    Tan, Junpeng
    Shi, Yukai
    Yang, Zhijing
    Wen, Caizhen
    Lin, Liang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2943 - 2956
  • [23] From Ensemble Clustering to Multi-View Clustering
    Tao, Zhiqiang
    Liu, Hongfu
    Li, Sheng
    Ding, Zhengming
    Fu, Yun
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2843 - 2849
  • [24] Multi-Task Multi-View Clustering
    Zhang, Xiaotong
    Zhang, Xianchao
    Liu, Han
    Liu, Xinyue
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3324 - 3338
  • [25] A Comprehensive Survey on Multi-View Clustering
    Fang, Uno
    Li, Man
    Li, Jianxin
    Gao, Longxiang
    Jia, Tao
    Zhang, Yanchun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 12350 - 12368
  • [26] Multi-view Clustering of Multilingual Documents
    Kim, Young-Min
    Amini, Massih-Reza
    Goutte, Cyril
    Gallinari, Patrick
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 821 - 822
  • [27] Adversarial Incomplete Multi-view Clustering
    Xu, Cai
    Guan, Ziyu
    Zhao, Wei
    Wu, Hongchang
    Niu, Yunfei
    Ling, Beilei
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3933 - 3939
  • [28] Sequential multi-view subspace clustering
    Lei, Fangyuan
    Li, Qin
    Neural Networks, 2022, 155 : 475 - 486
  • [29] Lifelong Multi-view Spectral Clustering
    Cai, Hecheng
    Tan, Yuze
    Huang, Shudong
    Lv, Jiancheng
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3488 - 3496
  • [30] Efficient multi-view clustering networks
    Ke, Guanzhou
    Hong, Zhiyong
    Yu, Wenhua
    Zhang, Xin
    Liu, Zeyi
    APPLIED INTELLIGENCE, 2022, 52 (13) : 14918 - 14934