Network regression with predictive clustering trees

被引:40
|
作者
Stojanova, Daniela [1 ]
Ceci, Michelangelo [2 ]
Appice, Annalisa [2 ]
Dzeroski, Saso [1 ]
机构
[1] Jozef Stefan Inst, Jozef Stefan Int Postgrad Sch, Dept Knowledge Technol, Ctr Excellence Integrated Approaches Chem & Biol, Ljubljana 1000, Slovenia
[2] Univ Bari Aldo Moro, Dipartimento Informat, I-70125 Bari, Italy
关键词
Autocorrelation; Predictive clustering trees; Regression inference; Network data; SPATIAL AUTOCORRELATION; DEPENDENCE;
D O I
10.1007/s10618-012-0278-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Network data describe entities represented by nodes, which may be connected with (related to) each other by edges. Many network datasets are characterized by a form of autocorrelation, where the value of a variable at a given node depends on the values of variables at the nodes it is connected with. This phenomenon is a direct violation of the assumption that data are independently and identically distributed. At the same time, it offers an unique opportunity to improve the performance of predictive models on network data, as inferences about one entity can be used to improve inferences about related entities. Regression inference in network data is a challenging task. While many approaches for network classification exist, there are very few approaches for network regression. In this paper, we propose a data mining algorithm, called NCLUS, that explicitly considers autocorrelation when building regression models from network data. The algorithm is based on the concept of predictive clustering trees (PCTs) that can be used for clustering, prediction and multi-target prediction, including multi-target regression and multi-target classification. We evaluate our approach on several real world problems of network regression, coming from the areas of social and spatial networks. Empirical results show that our algorithm performs better than PCTs learned by completely disregarding network information, as well as PCTs that are tailored for spatial data, but do not take autocorrelation into account, and a variety of other existing approaches.
引用
收藏
页码:378 / 413
页数:36
相关论文
共 50 条
  • [1] Network Regression with Predictive Clustering Trees
    Stojanova, Daniela
    Ceci, Michelangelo
    Appice, Annalisa
    Dzeroski, Saso
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2011, 6913 : 333 - 348
  • [2] Network regression with predictive clustering trees
    Daniela Stojanova
    Michelangelo Ceci
    Annalisa Appice
    Sašo Džeroski
    [J]. Data Mining and Knowledge Discovery, 2012, 25 : 378 - 413
  • [3] Option Predictive Clustering Trees for Multi-target Regression
    Osojnik, Aljaz
    Dzeroski, Saso
    Kocev, Dragi
    [J]. DISCOVERY SCIENCE, (DS 2016), 2016, 9956 : 118 - 133
  • [4] Option predictive clustering trees for multi-target regression
    Stepisnik, Tomaz
    Osojnik, Aljaz
    Dzeroski, Saso
    Kocev, Dragi
    [J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2020, 17 (02) : 459 - 486
  • [5] Predictive Clustering Trees for Hierarchical Multi-Target Regression
    Mileski, Vanja
    DZeroski, Saso
    Kocev, Dragi
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS XVI, IDA 2017, 2017, 10584 : 223 - 234
  • [6] Helping predictive analytics interpretation using regression trees and clustering perturbation
    Parisot, Olivier
    Didry, Yoanne
    Tamisier, Thomas
    Otjacques, Benoit
    [J]. JOURNAL OF DECISION SYSTEMS, 2015, 24 (01) : 55 - 72
  • [7] Ranking with predictive clustering trees
    Todorovski, L
    Blockeel, H
    Dzeroski, S
    [J]. MACHINE LEARNING: ECML 2002, 2002, 2430 : 444 - 455
  • [8] Oblique predictive clustering trees
    Stepisnik, Tomaz
    Kocev, Dragi
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 227
  • [9] Multivariate Predictive Clustering Trees for Classification
    Stepisnik, Tomaz
    Kocev, Dragi
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2020), 2020, 12117 : 331 - 341
  • [10] Incremental predictive clustering trees for online semi-supervised multi-target regression
    Aljaž Osojnik
    Panče Panov
    Sašo Džeroski
    [J]. Machine Learning, 2020, 109 : 2121 - 2139