Multi-View Clustering for Open Knowledge Base Canonicalization

被引:6
|
作者
Shen, Wei [1 ]
Yang, Yang [1 ]
Liu, Yinan [1 ]
机构
[1] Nankai Univ, Coll Comp Sci, TMCC, TKLNDST, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Open Knowledge Base Canonicalization; Multi-View Clustering; Training Data Optimization; VALIDITY INDEX; NUMBER;
D O I
10.1145/3534678.3539449
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open information extraction (OIE) methods extract plenty of OIE triples <noun phrase, relation phrase, noun phrase> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. However, these two views of knowledge have so far been leveraged in isolation by existing works. In this paper, we propose CMVC, a novel unsupervised framework that leverages these two views of knowledge jointly for canonicalizing OKBs without the need of manually annotated labels. To achieve this goal, we propose a multi-view CH K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering their different clustering qualities. In order to further enhance the canonicalization performance, we propose a training data optimization strategy in terms of data quantity and data quality respectively in each particular view to refine the learned view-specific embeddings in an iterative manner. Additionally, we propose a Log-Jump algorithm to predict the optimal number of clusters in a data-driven way without requiring any labels. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.
引用
收藏
页码:1578 / 1588
页数:11
相关论文
共 50 条
  • [41] Metric Multi-View Graph Clustering
    Tan, Yuze
    Liu, Yixi
    Wu, Hongjie
    Lv, Jiancheng
    Huang, Shudong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9962 - 9970
  • [42] Joint Multi-View Collaborative Clustering
    Khalafaoui, Yasser
    Matei, Basarab
    Grozavu, Nistor
    Lovisetto, Martino
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [43] Incomplete multi-view spectral clustering
    Zhao, Qianli
    Zong, Linlin
    Zhang, Xianchao
    Liu, Xinyue
    Yu, Hong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (03) : 2991 - 3001
  • [44] Efficient multi-view clustering networks
    Guanzhou Ke
    Zhiyong Hong
    Wenhua Yu
    Xin Zhang
    Zeyi Liu
    Applied Intelligence, 2022, 52 : 14918 - 14934
  • [45] Multi-view subspace text clustering
    Fraj, Maha
    Hajkacem, Mohamed Aymen Ben
    Essoussi, Nadia
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, : 1583 - 1606
  • [46] Evidential Weighted Multi-view Clustering
    Zhou, Kuang
    Guo, Mei
    Jiang, Ming
    BELIEF FUNCTIONS: THEORY AND APPLICATIONS (BELIEF 2021), 2021, 12915 : 22 - 32
  • [47] Latent Multi-view Subspace Clustering
    Zhang, Changqing
    Hu, Qinghua
    Fu, Huazhu
    Zhu, Pengfei
    Cao, Xiaochun
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4333 - 4341
  • [48] Multi-View MERA Subspace Clustering
    Long, Zhen
    Zhu, Ce
    Chen, Jie
    Li, Zihan
    Ren, Yazhou
    Liu, Yipeng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3102 - 3112
  • [49] FMvC: Fast Multi-View Clustering
    Wang, Jiada
    Liu, Yijun
    Ye, Wujian
    IEEE ACCESS, 2023, 11 : 12807 - 12819
  • [50] Adaptive Multi-View Subspace Clustering
    Tang Q.
    Zhang Y.
    He S.
    Zhou Z.
    Zhang, Yulong, 1600, Xi'an Jiaotong University (55): : 102 - 112