Taking Advantage of Highly-Correlated Attributes in Similarity Queries with Missing Values

被引:1
|
作者
Rodrigues, Lucas Santiago [1 ]
Cazzolato, Mirela Teixeira [1 ]
Machado Traina, Agma Juci [1 ]
Traina Jr, Caetano [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, Brazil
基金
巴西圣保罗研究基金会;
关键词
Missing data; Similarity search; Complex; Metric spaces; IMPUTATION;
D O I
10.1007/978-3-030-60936-8_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Incompleteness harms the quality of content-based retrieval and analysis in similarity queries. Missing data are usually evaluated using exclusion and imputation methods to infer possible values to complete gaps. However, such approaches can introduce bias into data and lose useful information. Similarity queries cannot perform over incomplete complex tuples, since distance functions are undefined over missing values. We propose the SOLID approach to allow similarity queries in complex databases without the need neither of data imputation nor deletion. First, SOLID finds highly-correlated metric spaces. Then, SOLID uses a weighted distance function to search by similarity over tuples of complex objects using compatibility factors among metric spaces. Experimental results show that SOLID outperforms imputation methods with different missing rates. SOLID was up to 7.3% better than the competitors in quality when querying over incomplete tuples, reducing 16.42% the error of similarity searches over incomplete data, and being up to 30.8 times faster than the closest competitor.
引用
收藏
页码:168 / 176
页数:9
相关论文
empty
未找到相关数据