Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks

被引:28
|
作者
Wang, Chenguang [1 ]
Song, Yangqiu [2 ]
El-Kishky, Ahmed [2 ]
Roth, Dan [2 ]
Zhang, Ming [1 ]
Han, Jiawei [2 ]
机构
[1] Peking Univ, Sch EECS, Beijing, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Urbana, IL USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
World Knowledge; Heterogeneous Information Network; Document Clustering; Knowledge Base; Knowledge Graph;
D O I
10.1145/2783258.2783374
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this paper, we provide an example of using world knowledge for domain dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. In the experiments, we use two existing knowledge bases as our sources of world knowledge. One is Freebase, which is collaboratively collected knowledge about entities and their organizations. The other is YAGO2, a knowledge base automatically extracted from Wikipedia and maps knowledge to the linguistic knowledge base, Word Net. Experimental results on two text benchmark datasets (20news-groups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features.
引用
收藏
页码:1215 / 1224
页数:10
相关论文
共 50 条
  • [1] Incorporating domain ontology information into clustering in heterogeneous networks
    Huang, Yue
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 11 (04)
  • [2] Incorporating semantic and syntactic information in document representation for document clustering
    Wang, Yong
    Hodges, Julia
    [J]. WMSCI 2005: 9TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 8, 2005, : 278 - 283
  • [3] Mutual clustering on comparative texts via heterogeneous information networks
    Jianping Cao
    Senzhang Wang
    Danyan Wen
    Zhaohui Peng
    Philip S. Yu
    Fei-yue Wang
    [J]. Knowledge and Information Systems, 2020, 62 : 175 - 202
  • [4] Mutual clustering on comparative texts via heterogeneous information networks
    Cao, Jianping
    Wang, Senzhang
    Wen, Danyan
    Peng, Zhaohui
    Yu, Philip S.
    Wang, Fei-yue
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (01) : 175 - 202
  • [5] World Knowledge as Indirect Supervision for Document Clustering
    Wang, Chenguang
    Song, Yangqiu
    Roth, Dan
    Zhang, Ming
    Han, Jiawei
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2016, 11 (02)
  • [6] Clustering via Meta-path Embedding for Heterogeneous Information Networks
    Zhang, Yongjun
    Yang, Xiaoping
    Wang, Liang
    [J]. 11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 188 - 194
  • [7] Incorporating Commonsense Knowledge into Story Ending Generation via Heterogeneous Graph Networks
    Wang, Jiaan
    Zou, Beiqi
    Li, Zhixu
    Qu, Jianfeng
    Zhao, Pengpeng
    Liu, An
    Zhao, Lei
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 85 - 100
  • [8] Spectral Clustering in Heterogeneous Information Networks
    Li, Xiang
    Kao, Ben
    Ren, Zhaochun
    Yin, Dawei
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4221 - 4228
  • [9] INCORPORATING NEW INFORMATION INTO EXISTING WORLD KNOWLEDGE
    POTTS, GR
    STJOHN, MF
    KIRSON, D
    [J]. COGNITIVE PSYCHOLOGY, 1989, 21 (03) : 303 - 333
  • [10] WMPEClus: Clustering via Weighted Meta-Path Embedding for Heterogeneous Information Networks
    Zhang, Yongjun
    Yang, Xiaoping
    Wang, Liang
    Li, Kede
    [J]. 2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 799 - 806