Multi-View Clustering for Open Knowledge Base Canonicalization

被引:6
|
作者
Shen, Wei [1 ]
Yang, Yang [1 ]
Liu, Yinan [1 ]
机构
[1] Nankai Univ, Coll Comp Sci, TMCC, TKLNDST, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Open Knowledge Base Canonicalization; Multi-View Clustering; Training Data Optimization; VALIDITY INDEX; NUMBER;
D O I
10.1145/3534678.3539449
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open information extraction (OIE) methods extract plenty of OIE triples <noun phrase, relation phrase, noun phrase> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. However, these two views of knowledge have so far been leveraged in isolation by existing works. In this paper, we propose CMVC, a novel unsupervised framework that leverages these two views of knowledge jointly for canonicalizing OKBs without the need of manually annotated labels. To achieve this goal, we propose a multi-view CH K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering their different clustering qualities. In order to further enhance the canonicalization performance, we propose a training data optimization strategy in terms of data quantity and data quality respectively in each particular view to refine the learned view-specific embeddings in an iterative manner. Additionally, we propose a Log-Jump algorithm to predict the optimal number of clusters in a data-driven way without requiring any labels. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.
引用
收藏
页码:1578 / 1588
页数:11
相关论文
共 50 条
  • [31] Projective Incomplete Multi-View Clustering
    Deng, Shijie
    Wen, Jie
    Liu, Chengliang
    Yan, Ke
    Xu, Gehui
    Xu, Yong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 10539 - 10551
  • [32] Multi-view clustering with dual tensors
    Mi, Yong
    Ren, Zhenwen
    Xu, Zhi
    Li, Haoran
    Sun, Quansen
    Chen, Hongxia
    Dai, Jian
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (10): : 8027 - 8038
  • [33] Multi-view intact space clustering
    Huang, Ling
    Chao, Hong-Yang
    Wang, Chang-Dong
    PATTERN RECOGNITION, 2019, 86 : 344 - 353
  • [34] Bidirectional Attentive Multi-View Clustering
    Lu, Jitao
    Nie, Feiping
    Dong, Xia
    Wang, Rong
    Li, Xuelong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (05) : 1889 - 1901
  • [35] Multi-view Contrastive Graph Clustering
    Pan, Erlin
    Kang, Zhao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [36] MPC: Multi-view Probabilistic Clustering
    Liu, Junjie
    Liu, Junlong
    Yan, Shaotian
    Jiang, Rongxin
    Tian, Xiang
    Gu, Boxuan
    Chen, Yaowu
    Shen, Chen
    Huang, Jianqiang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9499 - 9508
  • [37] Multi-view Clustering With Weighted Anchors
    Liu S.-Y.
    Wang S.-W.
    Tang C.
    Zhou S.-H.
    Wang S.-Q.
    Liu X.-W.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (06): : 1160 - 1170
  • [38] Multi-view clustering with interactive mechanism
    Wu, Danyang
    Hu, Zhanxuan
    Nie, Feiping
    Wang, Rong
    Yang, Hui
    Li, Xuelong
    NEUROCOMPUTING, 2021, 449 : 378 - 388
  • [39] Adaptive Weighted Multi-View Clustering
    Liu, Shuo Shuo
    Lin, Lin
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 209, 2023, 209 : 19 - 36
  • [40] Multi-View Intact Space Clustering
    Huang, Ling
    Chao, Hong-Yang
    Wang, Chang-Dong
    PROCEEDINGS 2017 4TH IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2017, : 500 - 505