Indexing Earth Mover's Distance over Network Metrics

被引:3
|
作者
Wang, Ting [1 ]
Meng, Shicong [2 ]
Bian, Jiang [3 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] Facebook Inc, Menlo Pk, CA 94025 USA
[3] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
Earth mover's distance; network metrics; similarity search; delimit and filter;
D O I
10.1109/TKDE.2014.2373359
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Earth Mover's Distance (EMD) is a well-known distance metric for data represented as probability distributions over a predefined feature space. Supporting EMD-based similarity search has attracted intensive research effort. Despite the plethora of literature, most existing solutions are optimized for L-p feature spaces (e.g., Euclidean space); while in a spectrum of applications, the relationships between features are better captured using networks. In this paper, we study the problem of answering k-nearest neighbor (k-NN) queries under network-based EMD metrics (NEMD). We propose OASIS, a new access method which leverages the network structure of feature space and enables efficient NEMD-based similarity search. Specifically, OASIS employs three novel techniques: (i) Range Oracle, a scalable model to estimate the range of k-th nearest neighbor under NEMD, (ii) Boundary Index, a structure that efficiently fetches candidates within given range, and (iii) Network Compression Hierarchy, an incremental filtering mechanism that effectively prunes false positive candidates to save unnecessary computation. Through extensive experiments using both synthetic and real data sets, we confirmed that OASIS significantly outperforms the state-of-the-art methods in query processing cost.
引用
收藏
页码:1588 / 1601
页数:14
相关论文
共 50 条
  • [1] Metric Indexing for the Earth Mover's Distance
    Hsiao, Vincent
    Samet, Hanan
    PROCEEDINGS OF THE 2ND ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON SEARCHING AND MINING LARGE COLLECTIONS OF GEOSPATIAL DATA, GEOSEARCH 2023, 2023, : 17 - 24
  • [2] Indexing the Earth Mover's Distance Using Normal Distributions
    Ruttenberg, Brian E.
    Singh, Ambuj K.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (03): : 205 - 216
  • [3] On Markov Earth Mover's Distance
    Wei, Jie
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2014, 14 (04)
  • [5] Keyword Search over Web Documents Based on Earth Mover's Distance
    Ma, Jiangang
    Sheng, Quan Z.
    Yao, Lina
    Xu, Yong
    Shemshadi, Ali
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 256 - 265
  • [6] Relevance Feedback for the Earth Mover's Distance
    Wichterich, Marc
    Beecks, Christian
    Sundermeyer, Martin
    Seidl, Thomas
    ADAPTIVE MULTIMEDIA RETRIEVAL: UNDERSTANDING MEDIA AND ADAPTING TO THE USER, 2011, 6535 : 72 - 86
  • [7] A Parallel Method for Earth Mover’s Distance
    Wuchen Li
    Ernest K. Ryu
    Stanley Osher
    Wotao Yin
    Wilfrid Gangbo
    Journal of Scientific Computing, 2018, 75 : 182 - 197
  • [8] A Parallel Method for Earth Mover's Distance
    Li, Wuchen
    Ryu, Ernest K.
    Osher, Stanley
    Yin, Wotao
    Gangbo, Wilfrid
    JOURNAL OF SCIENTIFIC COMPUTING, 2018, 75 (01) : 182 - 197
  • [9] Efficient Clustering Earth Mover's Distance
    Wagner, Jenny
    Ommer, Bjoern
    COMPUTER VISION - ACCV 2010, PT II, 2011, 6493 : 477 - 488
  • [10] Distributed Similarity Join Over Data Streams Based on Earth Mover's Distance
    Xu J.
    Song C.
    Lv P.
    Li T.-S.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (08): : 1779 - 1796