Exploring attribute correspondences across heterogeneous databases by mutual information

被引:4
|
作者
Zhao, HM [1 ]
Soofi, ES [1 ]
机构
[1] Univ Wisconsin, Sch Business Adm, Milwaukee, WI 53201 USA
关键词
attribute correspondence; attribute matching; composite information systems; database interoperability; heterogeneous databases; information theory; interorganizational systems; mutual information;
D O I
10.2753/MIS0742-1222220411
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Identifying attribute correspondences across heterogeneous databases is a critical and time-consuming step in integrating the databases. Past research has applied correlation analysis techniques to explore correspondences between attributes. These techniques, however, are appropriate for numeric attributes that are linearly related. This paper proposes an information-theoretic approach to exploring correspondences between attributes in heterogeneous databases. The proposed approach is applicable to character attributes, as well as to numeric attributes, regardless whether or not they are linearly related. It overcomes some serious shortcomings of previous approaches based on correlation analysis and has much broader applicability. The proposed procedure samples both matching and nonmatching pairs of records from the databases under consideration, applies matching functions to compare pairs of attributes, and then uses the mutual information to measure the dependency between a matching function as applied to a pair of attributes and the class (i.e., matching or nonmatching) of a pair of records. A high mutual information index implies a potential attribute correspondence, which is presented to the analyst for further evaluation. The paper also presents some empirical results demonstrating the utility of the proposed approach.
引用
收藏
页码:305 / 336
页数:32
相关论文
共 50 条
  • [21] Information retrieval of sequential data in heterogeneous XML databases
    Popovici, E
    Marteau, PF
    Ménier, G
    ADAPTIVE MULTIMEDIA RETRIEVAL: USER, CONTEXT, AND FEEDBACK, 2006, 3877 : 236 - 250
  • [22] Case Retrieval in Medical Databases by Fusing Heterogeneous Information
    Quellec, Gwenole
    Lamard, Mathieu
    Cazuguel, Guy
    Roux, Christian
    Cochener, Beatrice
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2011, 30 (01) : 108 - 118
  • [23] Vibrational scaling of the heterogeneous dynamics detected by mutual information
    Tripodo, Antonio
    Puosi, Francesco
    Malvaldi, Marco
    Leporini, Dino
    EUROPEAN PHYSICAL JOURNAL E, 2019, 42 (11):
  • [24] Vibrational scaling of the heterogeneous dynamics detected by mutual information
    Antonio Tripodo
    Francesco Puosi
    Marco Malvaldi
    Dino Leporini
    The European Physical Journal E, 2019, 42
  • [25] Mutual Information Preconditioning Improves Bayesian Networks Learning of Medical Databases
    Meloni, A.
    Ripoli, A.
    Positano, V.
    Landini, L.
    WORLD CONGRESS ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING, VOL 25, PT 4: IMAGE PROCESSING, BIOSIGNAL PROCESSING, MODELLING AND SIMULATION, BIOMECHANICS, 2010, 25 : 349 - 352
  • [26] Information entropy based attribute reduction for incomplete heterogeneous data
    Wang, Pei
    Qu, Liangdong
    Zhang, Qinli
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (01) : 219 - 236
  • [27] Learning Representations by Maximizing Mutual Information Across Views
    Bachman, Philip
    Hjelm, R. Devon
    Buchwalter, William
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [28] Attribute reduction based on intuitionistic fuzzy dominance mutual information in intuitionistic fuzzy information systems
    Liu, Xiaofeng
    Mo, Hong
    Dai, Jianhua
    INFORMATION SCIENCES, 2024, 676
  • [29] Detecting antimicrobial peptides by exploring the mutual information of their sequences
    Tripathi, Vijay
    Tripathi, Pooja
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2020, 38 (17): : 5037 - 5043
  • [30] Feasibility of Unified Usage of Heterogeneous Databases Storing Private Information
    Wang, Xin
    Hochin, Teruhisa
    Nomiya, Hiroki
    2013 SECOND IIAI INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2013), 2013, : 337 - 342