Learning from Skewed Class Multi-relational Databases

被引:0
|
作者
Guo, Hongyu [1 ]
Viktor, Herna L. [1 ]
机构
[1] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON K1N 6N5, Canada
关键词
Multirelational Data Mining; Classification; Multi-view Learning; Relational Database; Imbalanced Classes; Ensemble;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Relational databases, with vast amounts of data-from financial transactions, marketing surveys, medical records, to health informatics observations-and complex schemas, are ubiquitous in our society. Multirelational classification algorithms have been proposed to learn from such relational repositories, where multiple interconnected tables (relations) are involved. These methods search for relevant features both from a target relation (in which each tuple is associated with a class label) and relations related to the target, in order to better classify target relation tuples. However, in many practical database applications, such as credit card fraud detection and disease diagnosis, the target tuples are highly imbalanced. That is, the number of examples of one class (majority class) in the target relation is much higher than the others (minority classes). Many existing methods thus tend to produce poor predictive performance over the underrepresented class in the data. This paper presents a strategy to deal with such imbalanced multirelational data. The method learns from multiple views (feature sets) of relational data in order to construct view learners with different awareness of the imbalanced problem. These different observations possessed by multiple view learners are then combined, in order to yield a model which has better knowledge on both the majority and minority classes in a relational database. Experiments performed on six benchmarking data sets show that the proposed method achieves promising results when compared with other popular relational data mining algorithms, in terms of the ROC curve and AUC value obtained. In particular, an important result indicates that the method is superior when the class imbalanced is very high.
引用
收藏
页码:69 / 94
页数:26
相关论文
共 50 条
  • [1] Classification of Multi-relational Databases
    Wang, Xinchun
    Zhang, Sujuan
    [J]. APPLIED INFORMATICS AND COMMUNICATION, PT 2, 2011, 225 : 390 - +
  • [2] Structure-Aware Machine Learning over Multi-Relational Databases
    Schleich, Maximilian
    [J]. SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 6 - 7
  • [3] Multi-relational data mining in medical databases
    Habrard, A
    Bernard, M
    Jacquenet, F
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, PROCEEDINGS, 2003, 2780 : 365 - 374
  • [4] Privacy leakage in multi-relational databases: a semi-supervised learning perspective
    Xiong, Hui
    Steinbach, Michael
    Kumar, Vipin
    [J]. VLDB JOURNAL, 2006, 15 (04): : 388 - 402
  • [5] Privacy leakage in multi-relational databases: a semi-supervised learning perspective
    Hui Xiong
    Michael Steinbach
    Vipin Kumar
    [J]. The VLDB Journal, 2006, 15 : 388 - 402
  • [6] Pruning relations for substructure discovery of multi-relational databases
    Guo, Hongyu
    Viktor, Herna L.
    Paquet, Eric
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2007, PROCEEDINGS, 2007, 4702 : 462 - +
  • [7] Multi-Relational Learning with Gaussian Processes
    Xu, Zhao
    Kersting, Kristian
    Tresp, Volker
    [J]. 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1309 - 1314
  • [8] Multi-Relational Contrastive Learning for Recommendation
    Wei, Wei
    Xia, Lianghao
    Huang, Chao
    [J]. PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 338 - 349
  • [9] Adaptive Convolution for Multi-Relational Learning
    Jiang, Xiaotian
    Wang, Quan
    Wang, Bin
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 978 - 987
  • [10] Advantage of Integration in Big Data: Feature Generation in Multi-Relational Databases for Imbalanced Learning
    Ahmed, Farrukh
    Samorani, Michele
    Bellinger, Colin
    Zaiane, Osmar R.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 532 - 539