Token list based information search in a multi-dimensional massive database

被引:0
|
作者
Haiying Shen
Ze Li
Ting Li
机构
[1] Clemson University,Department of Electrical and Computer Engineering
[2] MicroStrategy,undefined
[3] Wal-mart Stores Inc.,undefined
关键词
Similarity data search; Proximity search; Locality sensitive hash; Database;
D O I
暂无
中图分类号
学科分类号
摘要
Finding proximity information is crucial for massive database search. Locality Sensitive Hashing (LSH) is a method for finding nearest neighbors of a query point in a high-dimensional space. It classifies high-dimensional data according to data similarity. However, the “curse of dimensionality” makes LSH insufficiently effective in finding similar data and insufficiently efficient in terms of memory resources and search delays. The contribution of this work is threefold. First, we study a Token List based information Search scheme (TLS) as an alternative to LSH. TLS builds a token list table containing all the unique tokens from the database, and clusters data records having the same token together in one group. Querying is conducted in a small number of groups of relevant data records instead of searching the entire database. Second, in order to decrease the searching time of the token list, we further propose the Optimized Token list based Search schemes (OTS) based on index-tree and hash table structures. An index-tree structure orders the tokens in the token list and constructs an index table based on the tokens. Searching the token list starts from the entry of the token list supplied by the index table. A hash table structure assigns a hash ID to each token. A query token can be directly located in the token list according to its hash ID. Third, since a single-token based method leads to high overhead in the results refinement given a required similarity, we further investigate how a Multi-Token List Search scheme (MTLS) improves the performance of database proximity search. We conducted experiments on the LSH-based searching scheme, TLS, OTS, and MTLS using a massive customer data integration database. The comparison experimental results show that TLS is more efficient than an LSH-based searching scheme, and OTS improves the search efficiency of TLS. Further, MTLS per forms better than TLS when the number of tokens is appropriately chosen, and a two-token adjacent token list achieves the shortest query delay in our testing dataset.
引用
收藏
页码:567 / 594
页数:27
相关论文
共 50 条
  • [11] Channel state information-based multi-dimensional parameter estimation for massive RF data in smart environments
    Xiaolong Yang
    Yuan She
    Liangbo Xie
    Zhaoyu Li
    EURASIP Journal on Advances in Signal Processing, 2021
  • [12] The new mechanism to query multi-dimensional database
    2001, Wuhan University (47):
  • [13] Agent-Based Parallelization of a Multi-Dimensional Semantic Database Model
    Li, Alex
    Fukuda, Munehiro
    2023 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI, 2023, : 64 - 69
  • [14] Research on multi-dimensional database activity monitor
    Chen, Dan
    Yang, Fei
    Ye, Xiao-Jun
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2015, 44 (02): : 266 - 271
  • [15] Unexpected Subgroup Mining in Multi-Dimensional Database
    Zhang J.-T.
    Wu S.
    Chen G.
    Shou L.-D.
    Chen K.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (08): : 1671 - 1685
  • [16] Psychological and physiological computing based on multi-dimensional foot information
    Li, Shengyang
    Yao, Huilin
    Peng, Ruotian
    Ma, Yuanjun
    Zhang, Bowen
    Zhao, Zhiyao
    Zhang, Jincheng
    Chen, Siyuan
    Wu, Shibin
    Shu, Lin
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (05)
  • [17] Textual Information Processing Based on Multi-Dimensional Indicator Weights
    Yang, Yuliang
    Lin, Zhengping
    Zhou, Yuzhong
    Shi, Jiahao
    Lin, Jie
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (02) : 1 - 7
  • [18] Online service search based on multi-dimensional semantic service model
    Chen, Hao
    Li, Yinsheng
    Shen, Jianping
    Bian, Jinghao
    PROCEEDINGS OF THE 2015 IEEE 19TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2015, : 171 - 175
  • [19] Visualization of Multi-dimensional Information of Electromagnetic Environment Based on Three Dimensional Spheres
    Gao, Ying
    Han, Hongshuai
    Ge, Fei
    Guo, Shuxia
    E-LEARNING AND GAMES, 2016, 9654 : 163 - 172
  • [20] Estimating the significance of a signal in a multi-dimensional search
    Vitells, Ofer
    Gross, Eilam
    ASTROPARTICLE PHYSICS, 2011, 35 (05) : 230 - 234