Token list based information search in a multi-dimensional massive database

被引:0
|
作者
Shen, Haiying [1 ]
Li, Ze [2 ]
Li, Ting [3 ]
机构
[1] Clemson Univ, Dept Elect & Comp Engn, Clemson, SC 29634 USA
[2] MicroStrategy, Tysons Corner, Fairfax, VA 22182 USA
[3] Wal Mart Stores Inc, Bentonville, AR 72716 USA
关键词
Similarity data search; Proximity search; Locality sensitive hash; Database; SIMILARITY SEARCH; SPACES;
D O I
10.1007/s10844-013-0289-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding proximity information is crucial for massive database search. Locality Sensitive Hashing (LSH) is a method for finding nearest neighbors of a query point in a high-dimensional space. It classifies high-dimensional data according to data similarity. However, the "curse of dimensionality" makes LSH insufficiently effective in finding similar data and insufficiently efficient in terms of memory resources and search delays. The contribution of this work is threefold. First, we study a Token List based information Search scheme (TLS) as an alternative to LSH. TLS builds a token list table containing all the unique tokens from the database, and clusters data records having the same token together in one group. Querying is conducted in a small number of groups of relevant data records instead of searching the entire database. Second, in order to decrease the searching time of the token list, we further propose the Optimized Token list based Search schemes (OTS) based on index-tree and hash table structures. An index-tree structure orders the tokens in the token list and constructs an index table based on the tokens. Searching the token list starts from the entry of the token list supplied by the index table. A hash table structure assigns a hash ID to each token. A query token can be directly located in the token list according to its hash ID. Third, since a single-token based method leads to high overhead in the results refinement given a required similarity, we further investigate how a Multi-Token List Search scheme (MTLS) improves the performance of database proximity search. We conducted experiments on the LSH-based searching scheme, TLS, OTS, and MTLS using a massive customer data integration database. The comparison experimental results show that TLS is more efficient than an LSH-based searching scheme, and OTS improves the search efficiency of TLS. Further, MTLS per forms better than TLS when the number of tokens is appropriately chosen, and a two-token adjacent token list achieves the shortest query delay in our testing dataset.
引用
收藏
页码:567 / 594
页数:28
相关论文
共 50 条
  • [32] The Multi-Dimensional Information Fusion Community Discovery Based on Topological Potential
    Fei, Rong
    Li, Shasha
    Xu, Qingzheng
    Hu, Bo
    Tang, Yu
    IEEE ACCESS, 2020, 8 : 3224 - 3239
  • [33] Medical Visual Information Retrieval Based on Multi-Dimensional Texture Modeling
    Depeursinge, Adrien
    Miiller, Henning
    PROCEEDINGS OF THE 2ND EUROPEAN FUTURE TECHNOLOGIES CONFERENCE AND EXHIBITION 2011 (FET 11), 2011, 7 : 127 - 129
  • [34] A multi-dimensional information sensing and monitoring system based on wearable devices
    Zhang, Zewang
    Wang, Heng
    Gong, Shigui
    Li, Shiwen
    PROCEEDINGS OF 2023 INTERNATIONAL CONFERENCE ON AI AND METAVERSE IN SUPPLY CHAIN MANAGEMENT, AIMSCM 2023, 2023,
  • [35] An OWL Multi-Dimensional Information Security Ontology
    Meriah, Ines
    Rabai, Latifa Ben Arfa
    Khedri, Ridha
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING, ENASE 2023, 2023, : 372 - 380
  • [36] Visualization of complex multi-dimensional accounting information
    Dull, RB
    Tegarden, DP
    ASSOCIATION FOR INFORMATION SYSTEMS PROCEEDINGS OF THE AMERICAS CONFERENCE ON INFORMATION SYSTEMS, 1998, : 6 - 8
  • [37] A Blotto game with multi-dimensional incomplete information
    Kovenock, Dan
    Roberson, Brian
    ECONOMICS LETTERS, 2011, 113 (03) : 273 - 275
  • [38] MULTI-DIMENSIONAL INDOOR LOCATION INFORMATION MODEL
    Xiong, Qing
    Zhu, Qing
    Zlatanova, Sisi
    Huang, Liang
    Zhou, Yan
    Du, Zhiqiang
    ISPRS ACQUISITION AND MODELLING OF INDOOR AND ENCLOSED ENVIRONMENTS 2013, 2013, 40-4-W4 : 45 - 49
  • [39] Small sample defect recognition method based on multi-dimensional selective search
    Lu S.
    Xu H.
    Zhang R.
    Liu J.
    Zhao K.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2022, 43 (01): : 220 - 228
  • [40] Multi-dimensional evaluation of information retrieval results
    Gao, XZ
    Murugesan, S
    Lo, B
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 192 - 198