Token list based information search in a multi-dimensional massive database

被引:0
|
作者
Haiying Shen
Ze Li
Ting Li
机构
[1] Clemson University,Department of Electrical and Computer Engineering
[2] MicroStrategy,undefined
[3] Wal-mart Stores Inc.,undefined
关键词
Similarity data search; Proximity search; Locality sensitive hash; Database;
D O I
暂无
中图分类号
学科分类号
摘要
Finding proximity information is crucial for massive database search. Locality Sensitive Hashing (LSH) is a method for finding nearest neighbors of a query point in a high-dimensional space. It classifies high-dimensional data according to data similarity. However, the “curse of dimensionality” makes LSH insufficiently effective in finding similar data and insufficiently efficient in terms of memory resources and search delays. The contribution of this work is threefold. First, we study a Token List based information Search scheme (TLS) as an alternative to LSH. TLS builds a token list table containing all the unique tokens from the database, and clusters data records having the same token together in one group. Querying is conducted in a small number of groups of relevant data records instead of searching the entire database. Second, in order to decrease the searching time of the token list, we further propose the Optimized Token list based Search schemes (OTS) based on index-tree and hash table structures. An index-tree structure orders the tokens in the token list and constructs an index table based on the tokens. Searching the token list starts from the entry of the token list supplied by the index table. A hash table structure assigns a hash ID to each token. A query token can be directly located in the token list according to its hash ID. Third, since a single-token based method leads to high overhead in the results refinement given a required similarity, we further investigate how a Multi-Token List Search scheme (MTLS) improves the performance of database proximity search. We conducted experiments on the LSH-based searching scheme, TLS, OTS, and MTLS using a massive customer data integration database. The comparison experimental results show that TLS is more efficient than an LSH-based searching scheme, and OTS improves the search efficiency of TLS. Further, MTLS per forms better than TLS when the number of tokens is appropriately chosen, and a two-token adjacent token list achieves the shortest query delay in our testing dataset.
引用
收藏
页码:567 / 594
页数:27
相关论文
共 50 条
  • [21] Development of a multi-dimensional load characteristic analysis system based on massive data
    Cao H.
    Wu Y.
    Li H.
    Gao C.
    Tang J.
    Guan W.
    Dianli Xitong Baohu yu Kongzhi/Power System Protection and Control, 2021, 49 (06): : 155 - 166
  • [22] A Multi-Dimensional Features-based Clustering Algorithm for Massive MIMO System
    Huang, Lianfen
    Lin, Hongyue
    Zhang, Huanhuan
    Zhao, Yifeng
    14TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2019), 2019, : 423 - 427
  • [23] Multi-dimensional Langevin approach to the fusion of massive nuclei
    Tokuda, T
    Okazaki, K
    Wada, T
    Ohta, M
    Abe, Y
    TOURS SYMPOSIUM ON NUCLEAR PHYSICS III, 1998, (425): : 171 - 178
  • [24] An Efficient Latch-free Database Index Based on Multi-dimensional Lists
    Lamar, Kenneth
    Izadpanah, Ramin
    Brandt, Jim
    Dechev, Damian
    2018 IEEE 37TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2018,
  • [25] Research on Construction Method of Multi-Dimensional Knowledge Map Based on Relational Database
    Wang, Xiao-Feng
    Ying, Bo-An
    Zhang, Xin
    Song, Xue-Ying
    Qi, Jing
    TEXTILE BIOENGINEERING AND INFORMATICS SYMPOSIUM PROCEEDINGS, VOLS 1 AND 2, 2012, : 713 - 718
  • [26] Multi-dimensional declustering methods for parallel database systems
    Lecture Notes in Computer Science, 1124
  • [27] Authenticating multi-dimensional query results in outsourced database
    Wang Xiaoming
    Lin Yanchun
    Yu Fang
    IET INFORMATION SECURITY, 2016, 10 (03) : 119 - 124
  • [28] Generalized sampling of multi-dimensional graph signals based on prior information
    Wei, Deyun
    Yan, Zhenyang
    SIGNAL PROCESSING, 2024, 224
  • [29] Multi-dimensional Information Filter for Space-Based Platforms (MIFS)
    Hershey, Paul
    Wolpe, Bill
    Klein, Jeffrey
    2017 12TH SYSTEM OF SYSTEMS ENGINEERING CONFERENCE (SOSE), 2017,
  • [30] Rapid botnet detecting method based on multi-dimensional information divergence
    Bai, Jun
    Xia, Jingbo
    Zhang, Wenjing
    Wang, Shaolong
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2014, 42 (09): : 28 - 32