A Methodology for Mining Document-Enriched Heterogeneous Information Networks

被引:12
|
作者
Grcar, Miha [1 ]
Trdin, Nejc [1 ]
Lavrac, Nada [1 ]
机构
[1] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana 1000, Slovenia
来源
COMPUTER JOURNAL | 2013年 / 56卷 / 03期
关键词
text mining; heterogeneous information networks; data fusion; classification; centroid-based classifier; diffusion kernels; data visualization;
D O I
10.1093/comjnl/bxs058
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The paper presents a new methodology for mining heterogeneous information networks, motivated by the fact that, in many real-life scenarios, documents are available in heterogeneous information networks, such as interlinked multimedia objects containing titles, descriptions and subtitles. The methodology consists of transforming documents into bag-of-words vectors, decomposing the corresponding heterogeneous network into separate graphs, computing structural-context feature vectors with PageRank, and finally, constructing a common feature vector space in which knowledge discovery is performed. We exploit this feature vector construction process to devise an efficient centroid-based classification algorithm. We demonstrate the approach by applying it to the task of categorizing video lectures. We show that our approach exhibits low time and space complexity without compromising the classification accuracy. In addition, we provide a qualitative analysis of the results by employing a data visualization technique.
引用
收藏
页码:321 / 335
页数:15
相关论文
共 50 条
  • [21] HINMINE: heterogeneous information network mining with information retrieval heuristics
    Jan Kralj
    Marko Robnik-Šikonja
    Nada Lavrač
    Journal of Intelligent Information Systems, 2018, 50 : 29 - 61
  • [22] Incremental workflow mining based on document versioning information
    Kindler, E
    Rubin, V
    Schäfer, W
    UNIFYING THE SOFTWARE PROCESS SPECTRUM, 2005, 3840 : 287 - 301
  • [23] Accessing accurate documents by mining auxiliary document information
    Joby, Jinju P.
    Korra, Jyothi
    2015 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATION ENGINEERING ICACCE 2015, 2015, : 634 - 638
  • [24] Retrieving Information from a Distributed Heterogeneous Document Collection
    Christoph Baumgarten
    Information Retrieval, 2000, 3 : 253 - 271
  • [25] Retrieving information from a distributed heterogeneous document collection
    Baumgarten, C
    INFORMATION RETRIEVAL, 2000, 3 (03): : 253 - 271
  • [26] A document repository architecture for heterogeneous business information management
    Mbarki, Mohamed
    Soule-Dupuy, Chantal
    Valles-Parlangeau, Nathalle
    ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: DATABASES AND INFORMATION SYSTEMS INTEGRATION, 2007, : 192 - 198
  • [27] Trend Analysis of Machine Learning - A Text Mining And Document Clustering Methodology
    Yang, Jiann-Min
    Wu, Wen-Chin
    Liao, Wei-Cheng
    Yin, Chi-Yen
    2009 INTERNATIONAL CONFERENCE ON NEW TRENDS IN INFORMATION AND SERVICE SCIENCE (NISS 2009), VOLS 1 AND 2, 2009, : 481 - 486
  • [28] Information networks at the mining and metallurgical plants
    Metallurg, 2001, (04): : 28 - 31
  • [29] Mining Enriched Contextual Information of Scientific Collaboration: A Meso Perspective
    He, Bing
    Ding, Ying
    Ni, Chaoqun
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (05): : 831 - 845
  • [30] INFORMATION DIFFUSION IN INTERCONNECTED HETEROGENEOUS NETWORKS
    Mahdizadehaghdam, Shahin
    Wang, Han
    Krim, Hamid
    Dai, Liyi
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 3759 - 3763