A Methodology for Mining Document-Enriched Heterogeneous Information Networks

被引:12
|
作者
Grcar, Miha [1 ]
Trdin, Nejc [1 ]
Lavrac, Nada [1 ]
机构
[1] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana 1000, Slovenia
来源
COMPUTER JOURNAL | 2013年 / 56卷 / 03期
关键词
text mining; heterogeneous information networks; data fusion; classification; centroid-based classifier; diffusion kernels; data visualization;
D O I
10.1093/comjnl/bxs058
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The paper presents a new methodology for mining heterogeneous information networks, motivated by the fact that, in many real-life scenarios, documents are available in heterogeneous information networks, such as interlinked multimedia objects containing titles, descriptions and subtitles. The methodology consists of transforming documents into bag-of-words vectors, decomposing the corresponding heterogeneous network into separate graphs, computing structural-context feature vectors with PageRank, and finally, constructing a common feature vector space in which knowledge discovery is performed. We exploit this feature vector construction process to devise an efficient centroid-based classification algorithm. We demonstrate the approach by applying it to the task of categorizing video lectures. We show that our approach exhibits low time and space complexity without compromising the classification accuracy. In addition, we provide a qualitative analysis of the results by employing a data visualization technique.
引用
下载
收藏
页码:321 / 335
页数:15
相关论文
共 50 条
  • [1] A Methodology for Mining Document-Enriched Heterogeneous Information Networks
    Grcar, Miha
    Lavrac, Nada
    DISCOVERY SCIENCE, 2011, 6926 : 107 - 121
  • [2] Mining Text Enriched Heterogeneous Citation Networks
    Kralj, Jan
    Valmarska, Anita
    Robnik-Sikonja, Marko
    Lavrac, Nada
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART I, 2015, 9077 : 672 - 683
  • [3] Mining Heterogeneous Information Networks: Principles and Methodologies
    Sun, Yizhou
    Han, Jiawei
    Synthesis Lectures on Data Mining and Knowledge Discovery, 2012, 3 (02): : 1 - 161
  • [4] Mining Heterogeneous Information Networks by Exploring the Power of Links
    Han, Jiawei
    DISCOVERY SCIENCE, PROCEEDINGS, 2009, 5808 : 13 - 30
  • [5] KnowSim: A document similarity measure on structured heterogeneous information networks
    School of EECS, Peking University, China
    不详
    Proc. IEEE Int. Conf. Data Min. ICDM, 1600, (1015-1020):
  • [6] KnowSim: A Document Similarity Measure on Structured Heterogeneous Information Networks
    Wang, Chenguang
    Song, Yangqiu
    Li, Haoran
    Zhang, Ming
    Han, Jiawei
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 1015 - 1020
  • [7] Constructing and Mining Heterogeneous Information Networks from Massive Text
    Shang, Jingbo
    Shen, Jiaming
    Liu, Liyuan
    Han, Jiawei
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 3191 - 3192
  • [8] Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks
    Wang, Chenguang
    Song, Yangqiu
    El-Kishky, Ahmed
    Roth, Dan
    Zhang, Ming
    Han, Jiawei
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1215 - 1224
  • [9] Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks
    Zhuang, Honglei
    Zhang, Jing
    Brova, George
    Tang, Jie
    Cam, Hasan
    Yan, Xifeng
    Han, Jiawei
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 1127 - 1132
  • [10] Meta-Path-Based Search and Mining in Heterogeneous Information Networks
    Yizhou Sun
    Jiawei Han
    Tsinghua Science and Technology, 2013, 18 (04) : 329 - 338