Similarity search in sets and categorical data using the signature tree

被引:12
|
作者
Mamoulis, N [1 ]
Cheung, DW [1 ]
Lian, W [1 ]
机构
[1] Univ Hong Kong, Dept Comp Sci & Informat Syst, Hong Kong, Hong Kong, Peoples R China
关键词
D O I
10.1109/ICDE.2003.1260783
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of mining association rules and classification, but similarity search is also a frequently applied operation. Access methods for multidimensional numerical data are inappropriate for this problem and specialized indexes are needed. We propose a method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types. In contrast to a previous technique, the, signature tree is dynamic and does not rely on hardwired constants. Experiments with synthetic and real datasets show that it is robust to different data characteristics, scalable to the database size and efficient for various queries.
引用
收藏
页码:75 / 86
页数:12
相关论文
共 50 条
  • [1] Clustering categorical data sets using tabu search techniques
    Ng, MK
    Wong, JC
    PATTERN RECOGNITION, 2002, 35 (12) : 2783 - 2790
  • [2] How to measure similarity for multiple categorical data sets?
    Simon Soon-Hyoung Park
    Justin JongSu Song
    James Jung-Hoon Lee
    Wookey Lee
    Sangbok Ree
    Multimedia Tools and Applications, 2015, 74 : 3489 - 3505
  • [3] How to measure similarity for multiple categorical data sets?
    Park, Simon Soon-Hyoung
    Song, Justin JongSu
    Lee, James Jung-Hoon
    Lee, Wookey
    Ree, Sangbok
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (10) : 3489 - 3505
  • [4] Prefix Tree Indexing for Similarity Search and Similarity Joins on Genomic Data
    Rheinlaender, Astrid
    Knobloch, Martin
    Hochmuth, Nicky
    Leser, Ulf
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2010, 6187 : 519 - 536
  • [5] Visual similarity effects in categorical search
    Alexander, Robert G.
    Zelinsky, Gregory J.
    JOURNAL OF VISION, 2011, 11 (08):
  • [6] Visual Similarity Effects in Categorical Search
    Alexander, Robert G.
    Zhang, Wei
    Zelinsky, Gregory J.
    COGNITION IN FLUX, 2010, : 1222 - 1227
  • [7] Categorical Data Skyline Using Classification Tree
    Lee, Wookey
    Song, Justin JongSu
    Leung, Carson K. -S.
    WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 181 - +
  • [8] Efficient similarity search for tree-structured data
    Li, Guoliang
    Liu, Xuhui
    Feng, Jianhua
    Zhou, Lizhu
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2008, 5069 : 131 - 149
  • [9] Classification of Categorical Data Using Hybrid Similarity Measures
    Hari, Seetha
    Srividya, V. V. R.
    WIRELESS NETWORKS AND COMPUTATIONAL INTELLIGENCE, ICIP 2012, 2012, 292 : 371 - 377
  • [10] Categorical and perceptual similarity effects in visual search
    Yeh, Lu-Chun
    Peelen, Marius V.
    PERCEPTION, 2021, 50 (1_SUPPL) : 111 - 111